SQL Reference
Machine Learning Model Functio...
Other Models
{{ocientml}} supports other machine learning models for advanced analysis that include association rules and feedforward neural network to create the model, use the create mlmodel syntax for details, see machine learning model functions docid\ jsgwuw5og56fzrve5h10g model option names are case sensitive association rules model type association rules the association rules model trains itself over rows of arrays to suggest other associated values the model returns values commonly associated in a set that are absent in the provided value set in practical use, this model is often used to help in retail transactions by suggesting related items or services for purchase based on similar past transactions model options loadbalance — if you set this option , the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause queryinternalparallelism — if you set this option , the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause skipdroptable — if you set this option to true , the database prevents the deletion of any intermediate tables that it creates during model training this option defaults to false execute the model create an association rules model when you create the model, there should be a single input column, an array of any inner type create mlmodel my model type association rules on ( select array agg(product) from public my table group by customer id ); also, a fter you create the model, you can see its details by querying the sys machine learning models and sys machine learning model options system catalog tables when you execute the model, you must provide an array with the same inner type as the input column the array represents the current state of the transaction you must also provide a second argument that is a positive integer that represents the association ranking of the item to return the value of 1 indicates to add the first most associated item to the transaction the third and fourth arguments are optional constant boolean values if the third argument is set to true , the model adds the most associated item to the transaction based on other transactions that have the same items in the current transaction otherwise, only some of the items must be the same the third argument defaults to true if the fourth argument is set to true , the model counts duplicate items in a transaction only one time otherwise, the model counts duplicate items multiple times to specify the fourth argument, you must specify the third argument the default value for the fourth argument is true , which means that duplicate items are counted one time select my model(array\['sausage', 'mushrooms'], 1); select my model(array\['sausage', 'mushrooms'], 1, true, true); the output from the model is a tuple ( v, p, c ) the definitions of these tuple values are v — the associated item determined by the model based on your input arguments p — the proportion of arrays in the training data that contain the items of the input array and the associated item v , as compared to arrays in the training data that have the input items but not necessarily the associated item v c — the count of occurrences of the associated item v in the training data arrays based on your input arguments after you execute a model, you can find the details about the execution results in the sys association rules models system catalog table for details, see the description of the associated system catalog tables in the machine learning section in the system catalog feedforward neural network model type feedforward network the feedforward network model is a neural network model where data moves from the inputs through hidden layers to the outputs feedforward neural networks are fully connected (sometimes named multilayer perceptrons) the number of inputs is determined by the number of columns in the input result set each input must be numeric the last column in the input result set is the target variable for models with one output, the column is also numeric for models with multiple outputs, the result must be a 1xn matrix (a row vector) common uses of multiple output models are multi class classification — multiple outputs are one hot encoded values that represent the class of the record the model uses results with argmax to select the highest probability class probability modeling — multiple output values represent probabilities between 0 and 1 that sum to 1 multiple numeric prediction — multiple output values represent different numeric values to predict against for faster, lower quality models, reduce the popsize , initialiterations , and subsequentiterations options conversely, for slower, higher quality models, increase the values for these same options model options required hiddenlayers — you must set this option to a positive integer that specifies how many hidden layers to use hiddenlayersize — you must set this option to a positive integer that specifies the number of nodes in each hidden layer outputs — you must set this option to a positive integer that specifies the number of outputs lossfunction — this option specifies the loss function that all hidden layer nodes and all output layer nodes use this function can be one of several predefined loss functions or a user defined loss function the predefined loss functions are squared error (regression), vector squared error (vector valued regression), log loss (binary classification with target values of 0 and 1), logits loss (binary classification with target values of 0 and 1), hinge loss (binary classification with target values of 1 and 1), and cross entropy loss (multi class classification) if the value for this required parameter is none of these strings, the model assumes a user defined loss function the user defined loss function specifies the per sample loss then, the actual loss function is the sum of this function applied to all samples the model should use the variable y to refer to the dependent variable in the training data, and the model should use the variable f to refer to the computed estimate for a given sample optional activationfunction — if you set this option, the values are linear , relu (rectified linear unit), leakyrelu (leaky rectified linear unit), tanh (hyperbolic tangent function), or sigmoid (fast sigmoid approximation) this option defaults to relu this option affects all layers except the output layer outputactivationfunction — if you set this option, the values are linear , relu (rectified linear unit), leakyrelu (leaky rectified linear unit), tanh (hyperbolic tangent function), or sigmoid (fast sigmoid approximation) different activation functions have different output ranges the chosen activation function should match the dependent variable of your data for example, if the dependent variable can be anything, then choose the linear value if the dependent variable is always positive, then choose the relu value if your outputs range from 1 to 1 or you perform hinge loss classification, tanh is a good option because the hyperbolic tangent function has the same range but, if your outputs range from 0 to 1 or you perform log loss classification, sigmoid is a better choice for the same reason this option defaults to linear the option only sets the activation function for the output layer metrics — if you set this option to true , the model calculates the average value of the loss function usesoftmax — if you set this option to true, the model applies a softmax function to the output of the output layer before computing the loss function this option defaults to true if the lossfunction is set to cross entropy loss and false otherwise popsize — if you set this option, the value must be a positive integer sets the population size for the particle swarm optimization (pso) part of the algorithm this option defaults to 100 mininitparamvalue — if you set this option, the value must be a floating point number sets the minimum for initial parameter values in the optimization algorithm this option defaults to 1 maxinitparamvalue — if you set this option, the value must be a floating point number sets the maximum for initial parameter values in the optimization algorithm this option defaults to 1 initialiterations — if you set this option, the value must be a positive integer sets the number of pso iterations for the first pso pass this option defaults to 50 subsequentiterations — if you set this option, the value must be a positive integer sets the number of pso iterations for subsequent pso iterations after the initial pass this option defaults to 10 momentum — if you set this option, the value must be a positive floating point number this parameter controls how much pso iterations move away from the local best value to explore new territory this option defaults to 0 1 gravity — if you set this option, the value must be a positive floating point number this parameter controls how much pso iterations are drawn back towards the local best value this option defaults to 0 01 lossfuncnumsamples — if you set this option, the value must be a positive integer this parameter controls how many points the model samples when estimating the loss function this option defaults to 1000 numgaattempts — if you set this option, the value must be a positive integer this parameter controls how many ga crossover possibilities the model tries this option defaults to 10,000 maxlinesearchiterations — if you set this option, the value must be a positive integer this parameter controls the maximum allowed number of iterations when running the line search part of the algorithm this option defaults to 20 minlinesearchstepsize — if you set this option, the value must be a positive floating point number this parameter controls the minimum step size of the line search algorithm this option defaults to 0 01 samplesperthread — if you set this option, the value must be a positive integer number this parameter controls the target number of samples the model sends to each thread each thread independently computes a logistic regression model, and the models are all combined at the end this option defaults to 1 million loadbalance — if you set this option , the database appends the using load balance shuffle = \<value> clause to all intermediate sql queries the model executes during training where value is the specified option value ( true or false ) the default value is unspecified in this case, the database does not add this clause queryinternalparallelism — if you set this option , the database appends the using parallelism = \<value> clause to all intermediate sql queries the model executes during training where value is the specified positive integer value the default value is unspecified in this case, the database does not add this clause skipdroptable — if you set this option to true , the database prevents the deletion of any intermediate tables that it creates during model training this option defaults to false execute the model c reate a neural network to perform multi class classification for three possible classes y1 , y2 , y3 are one hot encoded outputs if the values are 1 and the rest are 0, the value 1 denotes the class that the training data belongs to from the three classes create mlmodel my model type feedforward network on ( select x1, x2, {{y1,y2,y3}} from public my table ) options( 'hiddenlayers' > '1', 'hiddenlayersize' > '8', 'outputs' > '3', 'lossfunction' > 'cross entropy loss', 'activationfunction' > 'relu', 'usesoftmax' > 'true' ); also, a fter you create the model, you can see its details by querying the sys machine learning models and sys machine learning model options system catalog tables when you execute the model later, pass n 1 input variables and the model returns the estimate of the target variable in the case of multiple outputs, the result is a 1xn matrix (a row vector) if the model uses multiple outputs to perform multi class classification, use argmax to get the integer that represents the class select argmax(my model(x1, x2)) from my table; after you execute a model, you can find the details about the execution results in the sys feedforward network models system catalog table for details, see the description of the associated system catalog tables in the machine learning section in the system catalog related links machine learning model functions docid\ jsgwuw5og56fzrve5h10g machine learning models docid\ cq7pmtrfsxiaz5dp2d9bs