Other Models

﻿ supports other machine learning models for advanced analysis that include association rules and feedforward neural network. 
To create the model, use the CREATE MLMODEL syntax. For details, see CREATE MLMODEL﻿.
Model option names are case-sensitive.
Association Rules
Model Type: ASSOCIATION RULES
The association rules model trains itself over rows of arrays to suggest other associated values. The model returns values commonly associated in a set that are absent in the provided value set. In practical use, this model is often used to help in retail transactions by suggesting related items or services for purchase based on similar past transactions. 
Model Options
loadBalance — If you set this option, the database appends the USING load_balance_shuffle = <value> clause to all intermediate SQL queries the model executes during training where value is the specified option value (true or false). The default value is unspecified. In this case, the database does not add this clause.
queryInternalParallelism — If you set this option, the database appends the USING parallelism = <value> clause to all intermediate SQL queries the model executes during training where value is the specified positive integer value. The default value is unspecified. In this case, the database does not add this clause.
skipDropTable — If you set this option to true, the database prevents the deletion of any intermediate tables that it creates during model training. This option defaults to false.
﻿
Execute the Model
Create an association rules model. When you create the model, there should be a single input column, an array of any inner type. 
SQL
CREATE MLMODEL my_model
TYPE ASSOCIATION RULES ON (
  SELECT
    array_agg(product) 
  FROM public.my_table
  GROUP BY customer_id
);
CREATE MLMODEL my_model
TYPE ASSOCIATION RULES ON (
  SELECT
    array_agg(product) 
  FROM public.my_table
  GROUP BY customer_id
);
﻿
Also, after you create the model, you can see its details by querying the sys.machine_learning_models and sys.machine_learning_model_options system catalog tables.
When you execute the model, you must provide an array with the same inner type as the input column. The array represents the current state of the transaction. 
You must also provide a second argument that is a positive integer that represents the association ranking of the item to return. The value of 1 indicates to add the first most-associated item to the transaction. 
The third and fourth arguments are optional constant Boolean values. 
If the third argument is set to true, the model adds the most associated item to the transaction based on other transactions that have the same items in the current transaction. Otherwise, only some of the items must be the same. The third argument defaults to true. 
If the fourth argument is set to true, the model counts duplicate items in a transaction only one time. Otherwise, the model counts duplicate items multiple times. To specify the fourth argument, you must specify the third argument. The default value for the fourth argument is true, which means that duplicate items are counted one time. 
SQL
SELECT my_model(array['sausage', 'mushrooms'], 1);
SELECT my_model(array['sausage', 'mushrooms'], 1, true, true);
SELECT my_model(array['sausage', 'mushrooms'], 1);
SELECT my_model(array['sausage', 'mushrooms'], 1, true, true);
﻿
The output from the model is a tuple ( v, p, c ). The definitions of these tuple values are:
v — The associated item determined by the model based on your input arguments.
p — The proportion of arrays in the training data that contain the items of the input array and the associated item v, as compared to arrays in the training data that have the input items but not necessarily the associated item v. 
c —  The count of occurrences of the associated item v in the training data arrays based on your input arguments.  
After you execute a model, you can find the details about the execution results in the sys.association_rules_models system catalog table. 
For details, see the description of the associated system catalog tables in the Machine Learning section in the System Catalog.
Feedforward Neural Network
Model Type: FEEDFORWARD NETWORK
The feedforward network model is a neural network model where data moves from the inputs through hidden layers to the outputs. Feedforward neural networks are fully connected (sometimes named multilayer perceptrons). The number of inputs is determined by the number of columns in the input result set. Each input must be numeric. The last column in the input result set is the target variable. For models with one output, the column is also numeric. For models with multiple outputs, the result must be a 1xN matrix (a row vector).
Common uses of multiple output models are:
Multi-class classification — Multiple outputs are one-hot encoded values that represent the class of the record. The model uses results with argmax to select the highest probability class.
Probability modeling — Multiple output values represent probabilities between 0 and 1 that sum to 1.
Multiple numeric prediction — Multiple output values represent different numeric values to predict against.
For faster, lower-quality models, reduce the popSize, initialIterations, and subsequentIterations options. Conversely, for slower, higher-quality models, increase the values for these same options.
Model Options
Required
hiddenLayers — You must set this option to a positive integer that specifies how many hidden layers to use.
hiddenLayerSize — You must set this option to a positive integer that specifies the number of nodes in each hidden layer.
outputs — You must set this option to a positive integer that specifies the number of outputs.
lossFunction — This option specifies the loss function that all hidden layer nodes and all output layer nodes use. This function can be one of several predefined loss functions or a user-defined loss function. 
The predefined loss functions are squared_error (regression), vector_squared_error (vector-valued regression), log_loss (binary classification with target values of 0 and 1), logits_loss (binary classification with target values of 0 and 1), hinge_loss (binary classification with target values of -1 and 1), and cross_entropy_loss (multi-class classification). If the value for this required parameter is none of these strings, the model assumes a user-defined loss function. The user-defined loss function specifies the per-sample loss. Then, the actual loss function is the sum of this function applied to all samples. The model should use the variable y to refer to the dependent variable in the training data, and the model should use the variable f to refer to the computed estimate for a given sample.
Optional
activationFunction — If you set this option, the values are linear, relu (rectified linear unit), leakyrelu (leaky rectified linear unit), tanh (hyperbolic tangent function), or sigmoid (fast sigmoid approximation). This option defaults to relu. This option affects all layers except the output layer.
outputActivationFunction — If you set this option, the values are linear, relu (rectified linear unit), leakyrelu (leaky rectified linear unit), tanh (hyperbolic tangent function), or sigmoid (fast sigmoid approximation). Different activation functions have different output ranges. The chosen activation function should match the dependent variable of your data. For example, if the dependent variable can be anything, then choose the linear value. If the dependent variable is always positive, then choose the relu value. If your outputs range from -1 to 1 or you perform hinge loss classification, tanh is a good option because the hyperbolic tangent function has the same range. But, if your outputs range from 0 to 1 or you perform log loss classification, sigmoid is a better choice for the same reason. This option defaults to linear. The option only sets the activation function for the output layer.
metrics — If you set this option to true, the model calculates the average value of the loss function.
useSoftMax — If you set this option to true, the model applies a softmax function to the output of the output layer before computing the loss function. This option defaults to true if the lossFunction is set to cross_entropy_loss and false otherwise.
popSize — If you set this option, the value must be a positive integer. Sets the population size for the particle swarm optimization (PSO) part of the algorithm. This option defaults to 100.
minInitParamValue — If you set this option, the value must be a floating-point number. Sets the minimum for initial parameter values in the optimization algorithm. This option defaults to -1.
maxInitParamValue — If you set this option, the value must be a floating-point number. Sets the maximum for initial parameter values in the optimization algorithm. This option defaults to 1.
initialIterations — If you set this option, the value must be a positive integer. Sets the number of PSO iterations for the first PSO pass. This option defaults to 50.
subsequentIterations — If you set this option, the value must be a positive integer. Sets the number of PSO iterations for subsequent PSO iterations after the initial pass. This option defaults to 10.
momentum — If you set this option, the value must be a positive floating-point number. This parameter controls how much PSO iterations move away from the local best value to explore new territory. This option defaults to 0.1.
gravity — If you set this option, the value must be a positive floating-point number. This parameter controls how much PSO iterations are drawn back towards the local best value. This option defaults to 0.01.
lossFuncNumSamples — If you set this option, the value must be a positive integer. This parameter controls how many points the model samples when estimating the loss function. This option defaults to 1000.
numGAAttempts — If you set this option, the value must be a positive integer. This parameter controls how many GA crossover possibilities the model tries. This option defaults to 10,000.
maxLineSearchIterations — If you set this option, the value must be a positive integer. This parameter controls the maximum allowed number of iterations when running the line search part of the algorithm. This option defaults to 20.
minLineSearchStepSize — If you set this option, the value must be a positive floating-point number. This parameter controls the minimum step size of the line search algorithm. This option defaults to 0.01.
samplesPerThread — If you set this option, the value must be a positive integer number. This parameter controls the target number of samples the model sends to each thread. Each thread independently computes a logistic regression model, and the models are all combined at the end. This option defaults to 1 million.
loadBalance — If you set this option, the database appends the USING load_balance_shuffle = <value> clause to all intermediate SQL queries the model executes during training where value is the specified option value (true or false). The default value is unspecified. In this case, the database does not add this clause.
queryInternalParallelism — If you set this option, the database appends the USING parallelism = <value> clause to all intermediate SQL queries the model executes during training where value is the specified positive integer value. The default value is unspecified. In this case, the database does not add this clause.
skipDropTable — If you set this option to true, the database prevents the deletion of any intermediate tables that it creates during model training. This option defaults to false.
﻿
Execute the Model
Create a neural network to perform multi-class classification for three possible classes. y1, y2, y3 are one-hot encoded outputs. If the values are 1 and the rest are 0, the value 1 denotes the class that the training data belongs to from the three classes.
SQL
CREATE MLMODEL my_model
TYPE FEEDFORWARD NETWORK
ON (
  SELECT
    x1,
    x2,
    {{y1,y2,y3}}
  FROM public.my_table
)
options(
  'hiddenLayers' -> '1',
  'hiddenLayerSize' -> '8',
  'outputs' -> '3',
  'lossFunction' -> 'cross_entropy_loss',
  'activationFunction' -> 'relu',
  'useSoftmax' -> 'true'
);
CREATE MLMODEL my_model
TYPE FEEDFORWARD NETWORK
ON (
  SELECT
    x1,
    x2,
    {{y1,y2,y3}}
  FROM public.my_table
)
options(
  'hiddenLayers' -> '1',
  'hiddenLayerSize' -> '8',
  'outputs' -> '3',
  'lossFunction' -> 'cross_entropy_loss',
  'activationFunction' -> 'relu',
  'useSoftmax' -> 'true'
);
﻿
Also, after you create the model, you can see its details by querying the sys.machine_learning_models and sys.machine_learning_model_options system catalog tables.
When you execute the model later, pass N - 1 input variables and the model returns the estimate of the target variable. In the case of multiple outputs, the result is a 1xN matrix (a row vector). If the model uses multiple outputs to perform multi-class classification, use argmax to get the integer that represents the class.
SQL
SELECT argmax(my_model(x1, x2)) FROM my_table;
SELECT argmax(my_model(x1, x2)) FROM my_table;
﻿
After you execute a model, you can find the details about the execution results in the sys.feedforward_network_models system catalog table. 
For details, see the description of the associated system catalog tables in the Machine Learning section in the System Catalog.
Related Links
﻿Machine Learning Model Functions﻿﻿
﻿Machine Learning Models﻿﻿
Clustering and Dimension Reduction Models
Math Functions and Operators