Machine Learning Model Options
this list of options contains model options for all models in the system association rules model options optional loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause bagging model options required basemodels — this option specifies the children of the bagging model you must specify this value as a json array, where each object in the array has the three fields type , count , and options tasktype — this option specifies the type of task, which must be either classification or regression depending on the type of model for training optional rocnumsamples — if you set this option, you must specify a positive integer that represents the number of samples for the model to use when calculating the area under the roc curve you must also set the metrics option to true the default value is the number of child models bootstrap — if you set this option to true , the model uses bootstrap sampling with replacement, meaning each child model trains on a random subset of the data (either the rowsperchild or fractionselected value sets the exact number of rows), and the same row can appear multiple times for each child if you set this option to false , the model does not use replacement, meaning each row can appear at most one time per child the default value is false continuousfeatures — if you set this option, the value must be a comma separated list of the feature indexes that are continuous numeric variables indexes start with 1 in the default state, the model considers no features as continuous featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default fractionselected — if you set this option, the option represents the proportion of rows the model uses to train each child model the value is a double that must be in the interval (0, 1] you cannot set this option if you also set the rowsperchild option to a positive value the default behavior is that the model uses all available rows inputsperchild — if you set this option, the option represents the number of features used to create each child model the default value is the total number of features divided by 3 and rounded up maxchildthreads — if you set this option, the value must be an integer representing the maximum number of threads each child model can use if a child accepts a maxthreads option, the model passes this value to the child maxthreads — if you set this option, the option represents the maximum number of parallel threads to use while the model trains this value must be a positive integer the default value is 16 metrics — if you set this option to true , the system calculates certain metrics depending on the value of the tasktype option if you set the tasktype option to classification , the metrics are the percentage of correctly classified rows and the area under the roc curve if you set the value to regression , the metrics are the root mean square error and the adjusted r squared the default value is false nosnapshot — if you set this option to true , the data source must not change in this case, the database does not create an intermediate table that stores the result of the specified sql statement, which the model uses for training a random forest child decision trees always have this option set to true , so the database does not create a separate intermediate table for each decision tree the default value is false setting this option to true can speed up training when the training set is fixed requiredfeatures — if you set this option, the value must be a comma separated list of integers representing features starting at index 1 the bagging model passes these features down to every child the default value is an empty list, meaning there is no required feature rowsperchild — if you set this option to a positive integer, the number represents the number of rows (from a random sample) to use for each decision tree if you set this option to 0, each child uses all available rows the default value is 0 you cannot set this option to a positive value if you also set the fractionselected option skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false weighted — if you set this option, the system passes the value directly to child models that support it the behavior depends on the model type of the target child boosting model options required basemodels — this option specifies the children of the bagging model you must specify this as a json array, where each object in the array has the three fields type , count , and options learningrate — a decimal value between 0 0 and 1 0 that tunes how much the model learns from each successive child tasktype — this option specifies the type of task, which must be either classification or regression depending on the type of model for training optional rocnumsamples — if you set this option, you must specify a positive integer that represents the number of samples for the model to use when calculating the area under the roc curve you must also set the metrics option to true the default value is 10 continuousfeatures — if you set this option, the value must be a comma separated list of the feature indexes that are continuous numeric variables indexes start with 1 in the default state, the model considers no features as continuous featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default fractionselected — if you set this option, the option represents the proportion of rows the model uses to train each child model the value is a double that must be in the interval (0, 1] you cannot set this option if you also set the rowsperchild option to a positive value the default behavior is that the model uses all available rows inputsperchild — if you set this option, the value must be an integer type that is greater than or equal to 1, which specifies the number of input features each boosting child should use this value cannot exceed the number of input features available in the data set when you specify this value, the algorithm deterministically cycles through pre enumerated feature subsets to ensure each child uses exactly the specified number of features when you do not specify this value, the model uses all available features for each child lossfunction — if you set this option, the value represents the loss function used by the model accepted values are 'squared error' and 'log loss' when you set this value to 'squared error' , the model calculates errors as the squared difference between predicted and actual values the target column must contain numeric values this is the default value when the tasktype option is set to regression when you set this value to 'log loss' , the model calculates errors using logistic loss this is the default value when the tasktype option is classification maxthreads — if you set this option, the option represents the maximum number of parallel threads to use while the model trains this value must be a positive integer the default value is 16 metrics — if you set this option to true , the system calculates certain metrics depending on the value of the tasktype option if the tasktype option is set to classification , the metrics are the percentage of correctly classified rows and the area under the roc curve if the tasktype option is set to regression , the metrics are the root mean square error and the adjusted r squared the default value is false skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false decision tree model options optional rocnumsamples — if you set this option, you must specify a positive integer that represents the number of samples for the model to use when calculating the area under the roc curve you must also set the metrics option to true the default value is 10 continuousfeatures — if you set this option, the value must be a comma separated list of the feature indexes that are continuous numeric variables indexes start with 1 in the default state, the model considers no features as continuous distinctcountlimit — if you set this option, the value must be a positive integer type this value sets the limit for how many distinct values a non continuous feature and label can contain the default value is 256 doprune — if you set this option to true , the model uses pessimistic error pruning (pep) to prune the tree after training the default value is false enableresplits — if you set this option, it must be a boolean type that determines if the tree can reuse the same continuous feature multiple times along a single branch (e g , split on x1 < 7 and later x1 < 3 ) this action can capture more complex, range specific relationships the default value is true , meaning that continuous features remain available for additional splits after use, thereby allowing the tree to create more complex decision boundaries if you set this option to false , the model marks continuous features as exhausted after their first use, and the model cannot use them again in subsequent splits in the same tree featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default featuresubsetstrategy — if you set this option, the option specifies how many features the decision tree should consider at each split from the still available features when this value is higher, the model has a higher accuracy and lower variance, but takes longer to train you can specify this option either as an integer (e g , 4 , meaning consider up to four features at each split) or one of these values all (check every feature), sqrt (check up to the square root of the number of total features), and one third (check up to one third of the number of total features) the default value is all loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause maxcellstofetch — if you set this option, the value must be a positive integer controls the chunking behavior when fetching feature values during model training the limit represents the maximum number of data cells (calculated as number of columns × number of rows) that can be fetched in a single operation, not a byte limit when the expected data size exceeds this threshold, the algorithm switches to database based processing using sql queries instead of in memory processing this value defaults to 33,554,432 cells (calculated as 32 × 1024 × 1024) maxdepth — if you set this option, the value must be a positive integer this value sets the maximum allowable depth of the decision tree (the maximum number of features to split on) the default is unspecified, which means there is no maximum depth maxrows — if you set this option, the value must be a positive integer this option limits the number of rows used for model training by creating a snapshot table with only the specified number of rows from the input query this option cannot be used with nosnapshot > true (attempting to set both results in an invalid argument error during model creation) when this option is unspecified, the model trains using all rows from the input query maxthreads — if you set this option, the value must be a positive integer this value indicates the maximum number of parallel threads to use while the model trains the default value is 2 metrics — if you set this option to true , the model also calculates the percentage of samples correctly classified by the model and saves this information in a catalog table this option defaults to false nosnapshot — if you set this option to true , the database does not create an intermediate table that stores the result of the specified sql statement, which the model uses for training this option defaults to false in this case, the database creates and uses the intermediate table setting this option to true is useful when the training set is fixed if the training set is a table with modifications, set this option to false as the decision tree trainer uses different data sets in different parts of the tree likewise, if the training set consists of a query that returns 100 rows, then set this option to false because there is no guarantee that running that query twice generates the same 100 rows each time numsplits — if you set this option, the value must be an integer greater than 1 this value sets the maximum number of binary branches a continuous feature can consider the default value is 32 queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false skiplimitcheck — if you set this option to true , the model skips cardinality checks that throw errors when columns have too many values the limit that this option checks is the same one specified by the distinctcountlimit option the default value is false splitmetric — if you set this option, the option controls which function the model uses to evaluate the quality of a split during tree construction supported options are gini impurity (measures impurity based on class distributions), macro f1 (uses the macro averaged f1 score to guide splits), micro f1 (uses the micro averaged f1 score to guide splits), and weighted f1 (uses the class frequency weighted f1 score to guide splits) the default value is gini impurity suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false suppressjit — if you set this option to true , the model suppresses just in time code generation weighted — if you set this option, the model considers weights for labels if you set this option value to true , you must specify an additional column as a double in the training data for label weights rows with the same labels must have the same weights if you set this value to auto , the model calculates weights automatically by weighting each label according to the ratio of the count of the most frequent label to the count of the specified label as a result, the most frequent label has the weight 1 0 and the other label weights are higher the default value is false , which means all labels have equal weight feedforward network model options required hiddenlayersize — you must set this option to a positive integer type that specifies the number of nodes in each hidden layer hiddenlayers — you must set this option to a positive integer type that specifies how many hidden layers to use lossfunction — this option specifies the loss function that all hidden layer nodes and all output layer nodes use this function can be one of several predefined loss functions or a user defined loss function the predefined loss functions are squared error (regression), vector squared error (vector valued regression), log loss (binary classification with target values of 0 and 1), logits loss (binary classification with target values of 0 and 1), hinge loss (binary classification with target values of 1 and 1), and cross entropy loss (multi class classification) if the value for this required option is none of these strings, the model assumes a user defined loss function the user defined loss function specifies the per sample loss then, the actual loss function is the sum of this function applied to all samples the model should use the variable y to refer to the dependent variable in the training data, and the model should use the variable f to refer to the computed estimate for the specified sample outputs — you must set this option to a positive integer that specifies the number of outputs optional activationfunction — if you set this option, the values are linear , relu (rectified linear unit), leakyrelu (leaky rectified linear unit), tanh (hyperbolic tangent function), or sigmoid (fast sigmoid approximation) the default value is relu this option affects all layers except the output layer adambeta1 — if you set this option, the option represents the value of β₁ in the adam optimization algorithm for higher values of this option, training is less noisy but takes longer to converge the default value is 0 9 adambeta2 — if you set this option, the option represents the value of β₂ in the adam optimization algorithm for higher values of this option, training is less noisy but takes longer to converge the default value is 0 99 adamepsilon — if you set this option, the option represents the value of ε in the adam optimization algorithm for higher values of this option, training is more numerically stable but takes longer to converge the default value is 1e 7 featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default finitedifferenceh — if you set this option, the value must be a double representing the step size ( h ) for approximating gradients using the finite difference method the model uses this value only if analytical gradients are not active this value should generally be a small positive value, typically from 0 0001 ( 1e 4 ) to 0 0000001 ( 1e 7 ) the default value is 0 00001 ( 1e 5 ) gradientclipthreshold — if you set this option, the value must be a double that represents the gradient norm threshold for clipping when the overall gradient norm exceeds this threshold, the system scales all gradient components uniformly to preserve direction this operation prevents issues with exploding gradients in unstable loss landscapes set this value to 0 or a negative value to disable gradient clipping the default value is 1000000 ( 1e6 ) learningrate — if you set this option, the value must be a double type representing the base learning rate for the adam (adaptive moment estimation) machine learning optimizer adam adapts this rate individually for each parameter during training a common starting point for adam is 0 001 ( 1e 3 ) valid values must be positive and are generally in the range of 0 00001 ( 1e 5 ) to 0 01 ( 1e 2 ) a higher learning rate can speed up training, but can cause the optimizer to overshoot and miss optimal solutions conversely, a lower learning rate ensures more stable and precise convergence but can make training much slower if you do not specify this option, the system automatically selects a learning rate and adjusts it during training using the 1cycle learning schedule specifying a learning rate disables automatic adjustment and instead uses a fixed learning rate value loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause maxinitparamvalue — if you set this option, the value must be a floating point number sets the maximum for initial parameter values in the optimization algorithm the default value is 1 metrics — if you set this option to true , the model calculates the average value of the loss function mininitparamvalue — if you set this option, the value must be a floating point number sets the minimum for initial parameter values in the optimization algorithm the default value is 1 normalize — if you set this option to true , this option applies z score normalization to inputs by default, storing means and standard deviation, and automatically applying them at inference the default value is true numepochs — if you set this option, the value must be a positive integer type representing the maximum number of epochs, or full passes, during training through the entire data set if you do not specify this option, the default maximum is 200 , but training typically stops earlier due to automatic early stopping when the model has converged outputactivationfunction — if you set this option, the values are linear , relu (rectified linear unit), leakyrelu (leaky rectified linear unit), tanh (hyperbolic tangent function), or sigmoid (fast sigmoid approximation) different activation functions have different output ranges the chosen activation function should match the dependent variable of your data for example, if the dependent variable can be anything, then choose the linear value if the dependent variable is always positive, then choose the relu value if your outputs range from 1 to 1 or you perform hinge loss classification, tanh is a good option because the hyperbolic tangent function has the same range but, if your outputs range from 0 to 1 or you perform log loss classification, sigmoid is a better choice for the same reason this option defaults to linear the option only sets the activation function for the output layer queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause randomseed — if you set this option, the value must be a positive integer type representing the seed for the random number generator the system uses for weight initialization setting this option makes model training deterministic (given the same data and options) if you do not specify this option or set it to 0 , the system uses a non deterministic random seed skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false usesoftmax — if you set this option to true , the model applies a softmax function to the output of the output layer before computing the loss function the default value is true if you set the lossfunction to cross entropy loss , and false otherwise gaussian mixture model model options required numdistributions — this option must be a positive integer type that specifies the number of clusters of gaussian distributions for the model to make optional epsilon — if you specify this option, the value must be a valid positive floating point number when the maximum distance that the entire best model moves in its n dimensional space is less than this value, the algorithm terminates the default value is 0 00000001 ( 1e 8 ) featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause maxiterations — if you set this option, the option represents the maximum number of optimization iterations to train the model for higher values of this option, the model is likelier to converge to the expected epsilon , but it might take longer to train the default value is 100 normalize — if you set this option to true , the model automatically computes the mean and standard deviation of each feature and uses them to normalize the data during training defaults to true queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false gradient boosted trees model options required learningrate — a decimal value between 0 0 and 1 0 that tunes how much the model learns from each successive child numchildren — an integer value representing the total number of trees to build sequentially each tree learns to correct the errors of the previous trees optional continuousfeatures — if you set this option, the value must be a comma separated list of the feature indexes that are continuous numeric variables indexes start with 1 in the default state, the model considers no features as continuous enableresplits — if you set this option, the value must be a boolean type that determines if the tree can reuse the same continuous feature multiple times along a single branch (e g , split on x1 < 7 and later x1 < 3 ) this action can capture more complex, range specific relationships the default value is true , meaning that continuous features remain available for additional splits after use, thereby allowing the tree to create more complex decision boundaries if you set this option to false , the model marks continuous features as exhausted after their first use, and the model cannot use them again in subsequent splits in the same tree featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default fractionselected — if you set this option, the option represents the proportion of rows the model uses to train each child model the value is a double that must be in the interval (0, 1] you cannot set this option if you also set the rowsperchild option to a positive value the default behavior is that the model uses all available rows inputsperchild — if you set this option, the value must be an integer type greater than or equal to 1 that specifies the number of input features each boosting tree should use this value cannot exceed the number of input features available in the data set when you specify this value, the algorithm deterministically cycles through pre enumerated feature subsets to ensure each tree uses exactly the specified number of features the default behavior is that the model uses all available features for each tree lossfunction — if you set this option, the option represents the loss function used and determines the type of task the model does accepted values are 'squared error' and 'log loss' when you set this value to 'squared error' , the model calculates errors as the squared difference between predicted and actual values the target column must contain numeric values this is the default value for regression tasks when you set this value to 'log loss' , the model calculates errors using logistic loss this is the default value for classification tasks maxcellstofetch — if you set this value, the value must be an integer type that determines the memory threshold to switch from training with system memory to training with sql queries in the database in memory training is generally faster, but is limited by the available sql node memory if the size of a training data subset exceeds this value, then the system performs training operations using sql queries the default value is 33,554,432 (calculated as 32 1024 1024 ) maxdepth — if you set this value, the value must be a positive integer type that represents the maximum allowable depth of the child trees the default value is 3 maxthreads — if you set this value, the value must be a positive integer type that sets the maximum number of parallel threads to use for training each child decision tree parallel threads do not affect the sequential method of training each tree the default value is 16 metrics — if you set this value to true , the system calculates and stores final model metrics ( r²/rmse for regression or accuracy/logloss for classification) on the training data the default value is false numsplits — if you set this option, the value must be an integer greater than 1 this value sets the maximum number of binary branches a continuous feature can consider the default value is 32 resplitdepth — if you set this option, the value must be an integer type that sets the maximum depth at which tree nodes can be re split during optimization this option controls how deep the algorithm searches for better split points the default value is 6 resplitthreshold — if you set this option, the value must be a decimal type that sets the minimum improvement threshold required to trigger a re split operation lower values allow more aggressive re splitting but can increase training time higher values require larger improvements to trigger re splits the default value is 0 1 skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false kmeans model options required k — this option must be a positive integer type that specifies how many clusters to make optional epsilon — if you specify this option, the value must be a valid positive floating point value when the maximum distance that a centroid moves from one iteration of the algorithm to the next is less than this value, the algorithm terminates the default value is 0 00000001 ( 1e 8 ) featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default lloydrounds — if you set this option, the option represents the maximum number of iterations of the lloyd algorithm to train the model after guessing the centroids for higher values of this option, the model is more likely to be accurate but takes longer to train the default value is 20 loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause normalize — if you set this option to true , the model normalizes the data before the start of training the default value is true oversampling — if you set this option, the option represents the number of candidate guesses for the model to choose in the parallel round phase of k means|| for higher values of this option, the model is more likely to be accurate but takes longer to train the default value is k parallelrounds — if you set this option, the option represents the minimum number of parallel rounds for which the k means|| algorithm runs for higher values of this option, the model is more likely to be accurate but takes longer to train the default value is 8 queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false k nearest neighbors model options required k — this option must be a positive integer type that specifies how many closest points to use for classifying a new point optional distance — if you set this option, the value must be a function in sql syntax for calculating the distance between a point used for classification and points in the training data set this function should use the variables x1, x2, … for the 1st, 2nd, … features in the training data set, and p1, p2, … for the features in the point for classification the default value is the euclidian distance function featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause normalize — if you set this option to true , the model automatically computes the mean and standard deviation of each feature and uses them to normalize the data during training the default value is true queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false weight — if you specify this option, the value must be a function in sql syntax for calculating the weight of a neighbor the function should use the variable d for distance by default, the distance is set to 1 0 / (d + 0 1) , thus avoiding division by zero on exact inputs and still allowing neighbors to have some influence linear combination regression model options required functionn — you must specify the first function using a key named 'function1' subsequent functions must use keys with names that use subsequent values of n you must specify functions in sql syntax and should use the variables x1, x2, , xn to refer to the 1st, 2nd, and nth independent variables, respectively for example, 'function1' > 'sin(x1 x2 + x3)', 'function2' > 'cos(x1 x3)' optional featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default gamma — if you set this option, the value must be a matrix this value represents a tikhonov gamma matrix used for regularization for details, see tikhonov regularization https //en wikipedia org/wiki/tikhonov regularization loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause metrics — if you set this option to true , the model collects quality metrics such as the coefficient of determination (r squared), the adjusted coefficient of determination, and the root mean squared error (rmse) the default value is false normalize — if you set this option to true , the model uses auto scaling to compute the mean and standard deviation of each input feature to normalize data during training, making training more numerically stable the model then unscales parameters so the persisted model operates in the original units the default value is true queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false threshold — this option enables soft thresholding if you specify this option, the option must be a positive numeric value after the model calculates the coefficients, if any coefficients are greater than the threshold value, the model subtracts the threshold value from the coefficients if any coefficients are less than the negation of the threshold value, the model adds the threshold value to the coefficients for any coefficients between the negative and positive threshold values, the model sets the coefficients to zero weighted — if you set this option to true , the model performs weighted least squares regression, where each sample has an associated weight or importance when weighted, there is an extra numeric column after the dependent variable that represents the weight of the sample the default value is false yintercept — if you set this option, then the option must be a numeric value the system forces the specific y intercept (i e , the model value when x is zero) linear discriminant analysis model options optional featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause normalize — if you set this option to true , the model automatically computes the mean and standard deviation of each feature and uses them to normalize the data during training the default value is true queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false logistic regression model options optional featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause metrics — if you set this option to true , the model calculates the percentage of samples that are correctly classified by the model and saves this information in the sys logistic regression models system catalog table the default value is false normalize — if you set this option to true , the model uses auto scaling to compute the mean and standard deviation of each input feature to normalize data during training, making training more numerically stable the model then unscales parameters so the persisted model operates in the original units the default value is true numepochs — if you set this option, the value must be a positive integer type representing the maximum number of irls iterations during training the default value is 20 queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false multiple linear regression model options optional featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default gamma — if you set this option, the option must be a matrix the value represents a tikhonov gamma matrix used for regularization for details, see tikhonov regularization https //en wikipedia org/wiki/tikhonov regularization the model uses this option for ridge regression loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause metrics — if you set this option to true , the model collects quality metrics such as the coefficient of determination (r squared), the adjusted coefficient of determination, and the root mean squared error (rmse) the default value is false normalize — if you set this option to true , the model uses auto scaling to compute the mean and standard deviation of each input feature to normalize data during training, making training more numerically stable the model then unscales parameters so the persisted model operates in the original units the default value is true queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false threshold — if you set this option, the option enables soft thresholding the value must be a positive number after the model calculates the coefficients, if any coefficients exceed the threshold value, the model subtracts the threshold value from those coefficients if any coefficients are less than the negation of the threshold value, the model adds the threshold value to the coefficients for any coefficients that are between the negative and positive threshold values, the model sets the coefficients to zero weighted — if you set this option to true , the model performs weighted least squares regression, where each sample has a weight or importance associated with it in this case, the table contains an additional numeric column after the dependent variable, which contains the weight for the sample the default value is false yintercept — if you set this option, then the option must be a numeric value the system forces the specific y intercept (i e , the model value when x is zero) naive bayes model options optional continuousfeatures — if you set this option, the value must be a comma separated list of the feature indexes that are continuous numeric variables indexes start with 1 in the default state, the model considers no features as continuous featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause metrics — if you set this option to true , the model calculates the percentage of samples correctly classified by the model and saves this information in a system catalog table the default value is false queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false nonlinear regression model options required function — specify the name of the function to fit the data in sql syntax use a1, a2, … to refer to the parameters for optimization use x1, x2, … to refer to the input features the model does not allow some sql functions the model allows only scalar expressions that can be represented internally as postfix expressions most notably, the model does not allow some functions that are rewritten as case statements (like least() and greatest() ) if your function is not allowed, the model displays an error message numparameters — specify this option as a positive integer this value specifies the number of different parameters to optimize, i e , how many different an variables there are in the user specified function optional adambeta1 — if you set this option, the option represents the value of β₁ in the adam optimization algorithm for higher values of this option, training is less noisy but takes longer to converge the default value is 0 9 adambeta2 — if you set this option, the option represents the value of β₂ in the adam optimization algorithm for higher values of this option, training is less noisy but takes longer to converge the default value is 0 99 adamepsilon — if you set this option, the option represents the value of ε in the adam optimization algorithm for higher values of this option, training is more numerically stable but takes longer to converge the default value is 1e 7 featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default finitedifferenceh — if you set this option, the value must be a double type representing the step size ( h ) for approximating gradients using the finite difference method the model uses this value only if analytical gradients are not active this value should generally be a small positive number, typically from 0 0001 ( 1e 4 ) to 0 0000001 ( 1e 7 ) the default value is 0 00001 ( 1e 5 ) gradientclipthreshold — if you set this option, the value must be a double that represents the gradient norm threshold for clipping when the overall gradient norm exceeds this threshold, the system scales all gradient components uniformly to preserve direction this operation prevents issues with exploding gradients in unstable loss landscapes set this value to 0 or a negative value to disable gradient clipping the default value is 1000000 ( 1e6 ) lassocoefficient — if you specify this option, the value must be a double data type this option is the lasso coefficient for the loss function the default behavior is the function ignores this option, effectively setting this option to 0 0 learningrate — if you set this option, the value must be a double type representing the base learning rate for the adam (adaptive moment estimation) machine learning optimizer adam adapts this rate individually for each parameter during training a common starting point for adam is 0 001 ( 1e 3 ) valid values must be positive and are generally in the range of 0 00001 ( 1e 5 ) to 0 01 ( 1e 2 ) a higher learning rate can speed up training, but can cause the optimizer to overshoot and miss optimal solutions conversely, a lower learning rate ensures more stable and precise convergence but can make training much slower if you do not specify this option, the system automatically selects a learning rate and adjusts it during training using the 1cycle learning schedule specifying a learning rate disables automatic adjustment and instead uses a fixed learning rate value loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause lossfunction — if you set this option, the option indicates to the nonlinear optimizer the loss function to use on a per sample basis then, the actual loss function is the sum of this function applied to all samples the model should use the variable y to refer to the dependent variable in the training data and the variable f to refer to the computed estimate for the specified sample the default is the least squares function, which you can specify as (f y) (f y) maxinitparamvalue — if you specify this option, the value must be a floating point number this option sets the maximum for initial parameter values in the optimization algorithm the default value is 1 metrics — if you set this option to true , the model calculates the coefficient of determination (r squared), the adjusted r squared, and the root mean squared error (rmse) however, the model calculates these quality metrics using the least squares loss function, and not the user specified loss function, because these metrics only make sense for least squares the default value is false mininitparamvalue — if you specify this option, the value must be a floating point number this option sets the minimum for initial parameter values in the optimization algorithm the default value is 1 numepochs — if you set this option, the value must be a positive integer type representing the maximum number of epochs, or full passes, during training through the entire data set if you do not specify this option, the default maximum value is 200 for adam optimization or 100 for levenberg marquardt, but training typically stops earlier due to automatic early stopping when the model has converged queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause randomseed — if you set this option, the value must be a positive integer type representing the seed for the random number generator the system uses for weight initialization setting this option makes model training deterministic (given the same data and options) if you do not specify this option or set it to 0 , the system uses a non deterministic random seed ridgecoefficient — if you specify this option, the value must be a double data type this option is the ridge coefficient for the loss function the default behavior is the function ignores this option, effectively setting this option to 0 0 skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false polynomial regression model options required order — this option is the degree of the polynomial and must be set to a positive integer optional featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default gamma — if you specify this option, the value must be a matrix the value represents a tikhonov gamma matrix that is used for regularization for details, see tikhonov regularization https //en wikipedia org/wiki/tikhonov regularization loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause metrics — if you set this option to true , the model collects quality metrics such as the coefficient of determination (r squared), the adjusted coefficient of determination, and the root mean squared error (rmse) the default value is false negativepowers — if you set this option to true , the model includes independent variables raised to negative powers these variables are named laurent polynomials the model generates all possible terms such that the sum of the absolute value of the power of each term in each product is less than or equal to the order for example, with two independent variables and the order set to 2 , the model is y = a1 x1^2 + a2 x1^ 2 + a3 x2^2 + a4 x2^ 2 + a5 x1 x2 + a6 x1^ 1 x2 + a7 x1 x2^ 1 + a8 x1^ 1 x2^ 1 + a9 x1 + a10 x1^ 1 + a11 x2 + a12 x2^ 1 + b the default value is false normalize — if you set this option to true , the model uses auto scaling to compute the mean and standard deviation of each input feature to normalize data during training, making training more numerically stable the model then unscales parameters so the persisted model operates in the original units the default value is true queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false threshold — this option enables soft thresholding if you specify this option, then the option must be a positive numeric value after the model calculates the coefficients, if any of them are greater than the threshold value, the threshold value is subtracted from them if any coefficients are less than the negation of the threshold value, the model adds the threshold value to them for any coefficients that are between the negative and positive threshold values, the model sets those coefficients to zero weighted — if you set this option to true , the model performs weighted least squares regression, where each sample has an associated weight when weighted, there is an extra numeric column after the dependent variable that has the weight for the sample the default value is false yintercept — if you set this option, then the option must be a numeric value the system forces the specific y intercept (i e , the model value when x is zero) principal component analysis model options optional featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false random forest model options required numchildren — number of child decision trees optional rocnumsamples — if you set the option, you must also set the metrics option this positive integer indicates the number of samples for the model to use for the area under the roc curve the default value is the number of child decision trees bootstrap — if you set this option to true , the model uses bootstrap sampling with replacement, meaning the model trains each tree in the random forest on a random subset of the data (either the rowsperchild or fractionselected option sets the exact number of rows), and the same row can appear multiple times in each tree if you set this option to false , this option does not use replacement, meaning each row can appear at most once per tree the default value is false continuousfeatures — if you set this option, the value must be a comma separated list of the feature indexes that are continuous numeric variables indexes start with 1 in the default state, the model considers no features as continuous distinctcountlimit — if you set this option, the value must be a positive integer this value limits how many distinct values a non continuous feature and the label can contain the default value is 256 doprune — if you set this option to true , the model uses pessimistic error pruning (pep) to prune the tree after training the default value is false enableresplits — if you set this option, the value must be a boolean type that determines if the tree can reuse the same continuous feature multiple times along a single branch (e g , split on x1 < 7 and later x1 < 3 ) this action can capture more complex, range specific relationships the default value is true , meaning that continuous features remain available for additional splits after use, allowing the tree to create more complex decision boundaries when you set this option to false , the model marks continuous features as exhausted after their first use, and the model cannot use them again in subsequent splits in the same tree the model passes this option directly to the child decision trees featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default featuresubsetstrategy — if you set this option, the model passes this option directly to the child decision trees the option specifies how many features the child decision trees should consider at each split from the still available features when this value is higher, the model will have higher accuracy and lower variance, but takes longer to train you can specify this option either as an integer (e g , 4 , meaning consider up to 4 features at each split) or one of the three possible string options all (checks every feature), sqrt (checks up to the square root of the number of total features), and one third (checks up to one third of the number of total features) the default value is all fractionselected — if you set this option, the option represents the proportion of rows the model uses to train each child model the value is a double that must be in the interval (0, 1] you cannot set this option if you also set the rowsperchild option to a positive value the default behavior is that the model uses all available rows inputsperchild — if you set this option, the option specifies the number of features for the creation of each child decision tree the default value is the number of features you specify for the forest divided by 3 and rounded up loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause maxcellstofetch — if you set this option, the model passes this option directly to the child decision trees if you set this option, the value must be a positive integer controls the chunking behavior when fetching feature values during model training the limit represents the maximum number of data cells (calculated as the number of columns × number of rows) that the system can fetch in a single operation, not a byte limit when the expected data size exceeds this threshold, the algorithm switches to database based processing using sql queries instead of in memory processing the default value is 33,554,432 cells (calculated as 32 × 1024 × 1024 ) maxchildthreads — if you set this option, the value must be an integer type representing the maximum number of threads each child decision tree can use the default value is 1 maxdepth — if you set this option, the value must be a positive integer this value sets the maximum allowable depth of the decision tree the default value is 3 maxthreads — if you set this option, the option specifies the maximum number of parallel threads to use while the model trains decision trees this value must be a positive integer the default value is 16 metrics — if you set this option to true , the model also calculates the percentage of samples that are correctly classified by the model for the random forest and saves this information in a system catalog table the default value is false nosnapshot — if you set this option to true , the data source must not change in this case, the database does not create an intermediate table that stores the result of the specified sql statement, which the model uses for training a random forest child decision trees always have this option set to true , so the database does not create a separate intermediate table for each decision tree the default value is false setting this option to true is useful when the training set is fixed if the training set is a table with modifications, set this option to false as the decision tree trainer uses different data sets in different parts of the tree likewise, if the training set consists of a query that returns 100 rows, then set this option to false because there is no guarantee that running that query twice generates the same 100 rows each time numsplits — if you set this option, the value must be an integer type greater than 1 this value sets the maximum number of binary branches a continuous feature can consider the default value is 32 queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause requiredfeatures — if you set this option, the option must be a comma separated list of integers as strings representing specific features where the first feature has the value 1 the model uses these features in every decision tree in the forest the default behavior is that the decision tree in the forest can train on any feature in the list rowsperchild — if you set this option to a positive integer, the number represents the number of rows (from a random sample) to use for each decision tree if you set this option to 0 , each child uses all available rows the default value is 0 you cannot set this option to a positive value if you also set the fractionselected option skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false skiplimitcheck — if you set this option to true , the model skips cardinality checks that throw errors when columns have too many values the limit that this option checks is the same one that is specified by the distinctcountlimit option this option defaults to false splitmetric — if you set this option, the option controls which function the model uses to evaluate the quality of a split during tree construction supported options are gini impurity (measures impurity based on class distributions), macro f1 (uses macro averaged f1 score to guide splits), micro f1 (uses micro averaged f1 score to guide splits), and weighted f1 (uses class frequency weighted f1 score to guide splits) the default value is gini impurity suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false weighted — if you set this option, the model considers weights for labels if you set this option value to true , you must specify an additional column as a double in the training data for label weights rows with the same labels must have the same weights if you set this value to auto , the model calculates weights automatically by weighting each label according to the ratio of the count of the most frequent label to the count of the specified label as a result, the most frequent label has a weight of 1 0 , and the other label weights are higher this option defaults to false , which means all labels have equal weight regression tree model options optional continuousfeatures — if you set this option, the value must be a comma separated list of the feature indexes that are continuous numeric variables indexes start with 1 in the default state, the model considers no features as continuous distinctcountlimit — if you set this option, the value must be a positive integer this value sets the limit for the number of distinct values a non continuous feature and the label can contain this option defaults to 256 enableresplits — if you set this option, the value must be a boolean type that determines if the tree can reuse the same continuous feature multiple times along a single branch (e g , split on x1 < 7 and later x1 < 3 ) this action can capture more complex, range specific relationships the default value is true , meaning that continuous features remain available for additional splits after use, which allows the tree to create more complex decision boundaries when you set this option to false , the model marks continuous features as exhausted after their first use, and the model cannot use them again in subsequent splits in the same tree featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default featuresubsetstrategy — if you set this option, the option specifies how many features the regression tree should consider at each split from the still available features when this value is higher, the model has a higher accuracy and lower variance, but takes longer to train you can specify this option either as an integer (e g , 4 , meaning consider up to four features at each split) or one of these values all (check every feature), sqrt (check up to the square root of the number of total features), and one third (check up to one third of the number of total features) the default value is all maxcellstofetch — if you set this option, the value must be a positive integer controls the chunking behavior when fetching feature values during model training the limit represents the maximum number of data cells (calculated as the number of columns × number of rows) that the system can fetch in a single operation, not a byte limit when the expected data size exceeds this threshold, the algorithm switches to database based processing using sql queries instead of in memory processing the default value is 33,554,432 cells (calculated as 32 × 1024 × 1024 ) maxdepth — if you set this option, the value must be a positive integer this value sets the maximum allowable depth of the decision tree (the maximum number of features to split on) the default is unspecified, which means there is no maximum depth maxrows — if you set this option, the value must be a positive integer this option limits the number of rows used for model training by creating a snapshot table with only the specified number of rows from the input query this option cannot be used with nosnapshot > true (attempting to set both results in an invalid argument error during model creation) when this option is unspecified, the model trains using all rows from the input query maxthreads — if you set this option, the value must be a positive integer this value indicates the maximum number of parallel threads to use while the model trains the default value is 2 metrics — if you set this option to true , the model also calculates the percentage of samples correctly classified by the model and saves this information in a system catalog table the default value is false nosnapshot — if you set this option to true , the database does not create an intermediate table that stores the result of the specified sql statement, which the model uses for training this option defaults to false in this case, the database creates and uses the intermediate table setting this option to true is useful when the training set is fixed if the training set is a table with modifications, set this option to false , as the decision tree trainer uses different data sets in different parts of the tree likewise, if the training set consists of a query that returns 100 rows, then set this option to false because there is no guarantee that executing that query twice generates the same 100 rows each time numsplits — if you set this option, the value must be an integer greater than 1 this value sets the maximum number of binary branches a continuous feature can consider the default value is 32 queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training, where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause resplitdepth — if you set this option, the value must be an integer type that sets the maximum depth at which tree nodes can be re split during optimization controls how deep the algorithm searches for better split points the default value is 6 resplitthreshold — if you set this option, the value must be a decimal type that sets the minimum improvement threshold required to trigger a re split operation lower values (e g , 0 01 ) allow more aggressive re splitting but can increase training time higher values (e g , 1 0 ) require larger improvements to trigger re splits the default value is 0 1 skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false skiplimitcheck — if you set this option to true , the model skips cardinality checks that throw errors when columns have too many values the limit that this option checks is the same one that you specify using the distinctcountlimit option the default value is false suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false simple linear regression model options optional loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause metrics — if you set this option to true , the model collects quality metrics such as the coefficient of determination (r squared) and the root mean squared error (rmse) the default value is false normalize — if you set this option to true , the model uses auto scaling to compute the mean and standard deviation of each input feature to normalize data during training, making training more numerically stable the model then unscales parameters so the persisted model operates in the original units the default value is true queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause threshold — this option enables soft thresholding if you specify this option, then the option must be a positive numeric value after the model calculates the coefficients, if any are greater than the threshold value, the threshold value is subtracted from them if any coefficients are less than the negation of the threshold value, the model adds the threshold value to them for any coefficients that are between the negative and positive threshold values, the model sets those coefficients to zero yintercept — if you set this option, then the option must be a numeric value the system forces the specific y intercept (i e , the model value when x is zero) stacking model options required levelonemodel — this option specifies the level 1 child models of the stacking model you must specify this value as a json array, where each object in the array has the four fields type (required), name , options , ignorecolumn , and extracallarguments levelzeromodels — this option specifies the level 0 child models of the stacking model you must specify this value as a json array, where each object in the array has the four fields type (required), name , options , ignorecolumn , and extracallarguments optional extracolumncount — if you set this option, the value must be an integer type that specifies how many non feature columns there are in the input data the default value is 0 featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default haslabelcolumn — if you set this option, the value must be a boolean type that specifies whether the input data includes a label column the default value is true maxthreads — if you set this option, the option specifies the maximum number of parallel threads to use while the model trains this value must be a positive integer the default value is 16 nosnapshot — if you set this option to true , the data source must not change in this case, the database does not create an intermediate table that stores the result of the specified sql statement, which the model uses for training a random forest child decision trees always have this option set to true , so the database does not create a separate intermediate table for each decision tree the default value is false setting this option to true can speed up training when the training set is fixed preservedcolumnsforlevelone — if you set this option, this option specifies the columns from the original training data to pass as an input column to the level 1 model, in addition to the level 0 outputs this value should be a comma separated list of integers starting at 1 the default behavior is to preserve none of the columns skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false support vector machine model options optional featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default functionn — by default, svm uses a linear kernel if you use a different kernel, you must provide a list of functions that are summed together, just like with linear combination regression you must specify the first function using a key named 'function1' subsequent functions must use keys with names that use subsequent values of n you must specify functions in sql syntax and use the variables x1, x2, … , xn to refer to the 1st, 2nd, and nth independent variables, respectively you can specify the default linear kernel as 'function1' → 'x1', 'function2' → 'x2', and so on the model always adds a constant term equivalent to 'functionn' → '1 0' that you do not need to specify explicitly loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause metrics — if you set this option to true , the model also calculates the percentage of samples that are correctly classified by the model and saves this information in a catalog table this option defaults to false normalize — if you set this option to true , the model automatically computes the mean and standard deviation of each feature and uses them to normalize the data during training defaults to true numepochs — if you set this option, the value must be a positive integer type representing the maximum number of irls iterations during training the default value is 20 queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause regularizationcoefficient — if you set this option, the value must be a valid floating point number use this option to control the balance of finding a wide margin and minimizing incorrectly classified points in the loss function a larger (and positive) value makes having a wide margin around the hypersurface more important relative to the incorrectly classified points because of how the system implements svm, the values for this option are likely different than values used in other common svm implementations the default value is 1 0 / 1000000 0 skipdroptable — if you set this option to false , the database deletes any intermediate tables created during model training if you set this option to true , the database prevents the deletion of any intermediate tables created during model training the default value is false suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false vector autoregression model options required numlags — specify this option as a positive integer for the number of lags in the model numvariables — specify this option as a positive integer for the number of variables in the model optional featurearray — if you set this option to true , the model expects only one array type input column instead of multiple columns of training data each array row in the input column must be the same size the default value is false featurearrayelements — if you set this option, the featurearray option must be set to true the value must be a comma separated list of integers representing indexes of the input array to use starting at index 1 the system uses all indexes of the input array by default loadbalance — if you set this option, the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause metrics — if you set this option to true , the function collects the metric for the coefficient of determination (r squared) the default value is false normalize — if you set this option to true , the model uses auto scaling to compute the mean and standard deviation of each input feature to normalize data during training, making training more numerically stable the model then unscales parameters so the persisted model operates in the original units the default value is true queryinternalparallelism — if you set this option, the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause suppressarraylengthcheck — if you set this option, the featurearray option must be set to true the system skips checking that the array length is the same size for all rows in the input the default value is false threshold — this option enables soft thresholding if you specify this option, then the option must be a positive numeric value after the model calculates the coefficients, if any of them are greater than the threshold value, the threshold value is subtracted from them if any coefficients are less than the negation of the threshold value, the model adds the threshold value to them for any coefficients that are between the negative and positive threshold values, the model sets those coefficients to zero