# Other Models

Model Type: FEEDFORWARD NETWORK

The feedforward network model is a neural network model where data moves from the inputs through hidden layers to the outputs. Feedforward neural networks are fully connected (sometimes named multilayer perceptrons). The number of inputs is determined by the number of columns in the input result set. Each input must be numeric. The last column in the input result set is the target variable. For models with one output, the column is also a numeric. For models with multiple outputs, the result must be a 1xN matrix (a row vector).

Common uses of multiple output models are:

- Multi-class classification - Multiple outputs are one-hot encoded values that represent the class of the record. The model uses results with argmax to select the highest probability class.
- Probability modeling - Multiple output values represent probabilities between 0 and 1 that sum to 1.
- Multiple numeric prediction - Multiple output values represent different numeric values to predict against.

For faster, lower quality models, reduce the popSize, initialIterations, and subsequentIterations options. Conversely, for slower, higher quality models, increase the values for these same options.

To create a neural network to perform multi-class classification for three possible classes, use the following SQL statement. y1, y2, y3 are one-hot encoded outputs. If the values are 1 and the rest are 0, the value 1 denotes the class that the training data belongs to from the three classes.

When you execute the model later, pass N - 1 input variables and the model returns the estimate of the target variable. In the case of multiple outputs, the result is a 1xN matrix (a row vector). If the model uses multiple outputs to perform multi-class classification, use argmax to get the integer that represents the class.

hiddenLayers - You must set this option to a positive integer that specifies how many hidden layers to use.

hiddenLayerSize - You must set this option to a positive integer that specifies the number of nodes in each hidden layer.

outputs - You must set this option to a positive integer that specifies the number of outputs.

lossFunction - This option specifies the loss function that all hidden layer nodes and all output layer nodes use. This function can be one of several predefined loss functions, or a user-defined loss function.

The predefined loss functions are squared_error (regression), vector_squared_error (vector-valued regression), log_loss (binary classification with target values of 0 and 1), logits_loss (binary classification with target values of 0 and 1), hinge_loss (binary classification with target values of -1 and 1), and cross_entropy_loss (multi-class classification). If the value for this required parameter is none of these strings, the model assumes a user-defined loss function. The user-defined loss function specifies the per-sample loss. Then, the actual loss function is the sum of this function applied to all samples. The model should use the variable y to refer to the dependent variable in the training data, and the model should use the variable f to refer to the computed estimate for a given sample.

activationFunction - If you set this option, the values are linear, relu (rectified linear unit), leakyrelu (leaky rectified linear unit), tanh (hyperbolic tangent function), or sigmoid (fast sigmoid approximation). This option defaults to relu. This option affects all layers except the output layer.

outputActivationFunction - If you set this option, the values are linear, relu (rectified linear unit), leakyrelu that stands for a leaky rectified linear unit, tanh (hyperbolic tangent function), or sigmoid (fast sigmoid approximation). Different activation functions have different output ranges. The chosen activation function should match the dependent variable of your data. For example, if the dependent variable can be anything, then choose the linear value. If the dependent variable is always positive, then choose the relu value. If your outputs range from -1 to 1 or you perform hinge loss classification, tanh is a good option because the hyperbolic tangent function has the same range. But, if your outputs range from 0 to 1 or you perform log loss classification, sigmoid is a better choice for the same reason. This option defaults to linear. The option only sets the activation function for the output layer.

metrics - If you set this option to true, the model calculates the average value of the loss function.

useSoftMax - If you set this option to true, the model applies a softmax function to the output of the output layer, and before computing the loss function. This option defaults to true if the lossFunction is set to cross_entropy_loss and false otherwise.

popSize - If you set this option, the value must be a positive integer. Sets the population size for the particle swarm optimization (PSO) part of the algorithm. This option defaults to 100.

minInitParamValue - If you set this option, the value must be a floating point number. Sets the minimum for initial parameter values in the optimization algorithm. This option defaults to -1.

maxInitParamValue - If you set this option, the value must be a floating point number. Sets the maximum for initial parameter values in the optimization algorithm. This option defaults to 1.

initialIterations - If you set this option, the value must be a positive integer. Sets the number of PSO iterations for the first PSO pass. This option defaults to 50.

subsequentIterations - If you set this option, the value must be a positive integer. Sets the number of PSO iterations for subsequent PSO iterations after the initial pass. This option defaults to 10.

momentum - If you set this option, the value must be a positive floating point number. This parameter controls how much PSO iterations move away from the local best value to explore new territory. This option defaults to 0.1.

gravity - If you set this option, the value must be a positive floating point number. This parameter controls how much PSO iterations are drawn back towards the local best value. This option defaults to 0.01.

lossFuncNumSamples - If you set this option, the value must be a positive integer. This parameter controls how many points the model samples when estimating the loss function. This option defaults to 1000.

numGAAttempts - If you set this option, the value must be a positive integer. This parameter controls how many GA crossover possibilities the model tries. This option defaults to 10,000.

maxLineSearchIterations - If you set this option, the value must be a positive integer. This parameter controls the maximum allowed number of iterations when running the line search part of the algorithm. This option defaults to 20.

minLineSearchStepSize - If you set this option, the value must be a positive floating point number. This parameter controls the minimum step size of the line search algorithm. This option defaults to 0.01.

samplesPerThread - If you set this option, the value must be a positive integer number. This parameter controls the target number of samples that the model sends to each thread. Each thread independently computes a logistic regression model, and the models are all combined at the end. This option defaults to 1 million.

Model Type: PRINCIPAL COMPONENT ANALYSIS

Principal Component Analysis (PCA) is typically not used as a model on its own. PCA is most commonly used on the inputs to other models. PCA serves two purposes.

- PCA normalizes all numeric feature data. Some types of models are sensitive to the scale of numeric features, and when different features have different scales, the results end up skewed. PCA normalizes all features to the same scale.
- PCA is used for dimensionality reduction. PCA computes linear combinations of the original features to put the most signal into a smaller number of new features.

The input result set when creating a PCA model is N numeric columns that are all features. There is no label or dependent variable. After you create a PCA model, the sys.principal_component_analysis_models catalog table contains information on the percentage of the signal that is in each PCA feature. You can use this information to figure out how many of the output features to keep.

You can also use PCA models as inputs to other models. For example, if you have three features and you want to use PCA to reduce the number to two features, you can execute the following SQL statements.

You can use this model as input for another model, for example, logistic regression.

To correctly use this model later, you must pass the original features through the PCA model when you execute the logistic regression model.

Similarly, to create a PCA analysis over four variables, execute this SQL statement:

When you execute the trained PCA model, you must provide the same original input features in the same order, followed by a positive integer argument that specifies the PCA component to return. The PCA component index starts at 1.

None

Model Type: SUPPORT VECTOR MACHINE

Support Vector Machine (SVM) is a binary classification algorithm. SVM essentially finds a hypersurface (the hypersurface is a curve in 2-dimensional space) that correctly splits the data into two classes and maximizes the margin around the hypersurface. By default, SVM finds a hyperplane to split the data (the hyperplane is a straight line in 2-dimensional space). SVM uses a hinge loss function to balance the two objectives of finding a hyperplane with a wide margin while minimizing the number of incorrectly classified points.

The first N - 1 input columns are the features and must be numeric. The last column is the label and can be any arbitrary type.

For faster, lower quality models, reduce the popSize, initialIterations, and subsequentIterations options. Conversely, for slower, higher quality models, increase the values for these same options.

When you execute the model, the N - 1 features must be passed as parameters. The model returns the expected label.

metrics - If you set this option to true, the model also calculates the percentage of samples that are correctly classified by the model and saves this information in a catalog table. This option defaults to false.

regularizationCoefficient - If you specify this option, the value must be a valid floating point number. This option is used to control the balance of finding a wide margin and minimizing incorrectly classified points in the loss function. When this value is larger (and positive) it makes having a wide margin around the hypersurface more important relative to the incorrectly classified points. Because of how implements SVM, the values for this parameter are likely different than values used in other common SVM implementations. This option defaults to 1.0 / 1000000.0.

functionN - By default, SVM uses a linear kernel. If you use a different kernel, you must provide a list of functions that are summed together, just like with linear combination regression. You must specify the first function using a key named 'function1'. Subsequent functions must use keys with names that use subsequent values of N. You must specify functions in SQL syntax, and should use the variables x1, x2, …, xn to refer to the 1st, 2nd, and nth independent variables respectively. You can specify the default linear kernel as: 'function1' → 'x1', 'function2' → 'x2', and so on. The model always adds a constant term equivalent to 'functionN' → '1.0' that you do not need to be explicitly specify.

popSize - If you set this option, the value must be a positive integer. This value sets the population size for the particle swarm optimization (PSO) part of the algorithm. This option defaults to 100.

minInitParamValue - If you set this option, the value must be a floating point number. Sets the minimum for initial parameter values in the optimization algorithm. This option defaults to -10.

maxInitParamValue - If you set this option, the value must be a floating point number. Sets the maximum for initial parameter values in the optimization algorithm. This option defaults to 10.

initialIterations - If you set this option, the value must be a positive integer. Sets the number of PSO iterations for the first PSO pass. This option defaults to 500.

subsequentIterations - If you set this option, the value must be a positive integer. Sets the number of PSO iterations for subsequent PSO iterations after the initial pass. This option defaults to 100.

momentum - If you set this option, the value must be a positive floating point number. This parameter controls how much PSO iterations move away from the local best value to explore new territory. This option defaults to 0.1.

gravity - If you set this option, the value must be a positive floating point number. This parameter controls how much PSO iterations are drawn back towards the local best value. This option defaults to 0.01.

lossFuncNumSamples - If you set this option, the value must be a positive integer. This parameter controls how many points the model samples when estimating the loss function. This option defaults to 1000.

numGAAttempts - If you set this option, the value must be a positive integer. This parameter controls how many GA crossover possibilities the model tries. This option defaults to 10 million.

maxLineSearchIterations - If you set this option, the value must be a positive integer. This parameter controls the maximum allowed number of iterations when running the line search part of the algorithm. This option defaults to 200.

minLineSearchStepSize - If you set this option, the value must be a positive floating point number. This parameter controls the minimum step size that the line search algorithm ever takes. This option defaults to 1e-5.

samplesPerThread - If you set this option, the value must be a positive integer number. This parameter controls the target number of samples that the model sends to each thread. Each thread independently computes a logistic regression model, and the models are all combined at the end. This option defaults to 1 million.