enables regression analysis, which is the process of finding a best-fit function for a data set, in the System.Documentation Index
Fetch the complete documentation index at: https://docs.ocient.com/llms.txt
Use this file to discover all available pages before exploring further.
Simple Linear Regression
The simplest case is simple linear regression. Simple linear regression finds a best-fit linear relationship between one variable and another. Suppose you have this data with a bit of noise added. Use theCREATE TABLE AS SELECT SQL statement to create the mldemo.slr table. Generate 100 rows of random data.
SQL
SQL
machine_learning_models and simple_linear_regression_models system catalog tables.
SQL
SQL
Machine Learning Model Options
You can provide options for most model types. Provide options by pairing the string for an option to its corresponding value. For example, simple linear regression models have ametrics option that you can use to ask the database to calculate some quality of fit metrics. Use the same model and enable metrics collection, which is off by default.
SQL
machine_learning_models and simple_linear_regression_models system catalog tables for more fields.
SQL
f(0) is fixed. Use the yIntercept option. Create a model on the same data, but force the y-intercept to be zero. The metrics show that the model is not as good a fit as when the intercept is free to vary.
SQL
Multiple Linear Regression
In multiple linear regression, the model is still predicting an output value. But instead of handling one input variable, the model can handle an arbitrary number of input variables. Multiple linear regression finds the best-fit model of the form:Text
SQL
Polynomial Regression
With polynomial regression, the database finds a least-squares best-fit polynomial of whatever degree you specify. Same as multiple linear regression, you determine the number of independent variables. The training data has a1/(x1 * x2) term that the model does not match, but that term quickly tends towards zero, so the model should provide a good fit with a quadratic polynomial of x1 and x2. Some of the coefficients match the data, but some do not because the model is compensating for not being able to fit the form of the data exactly. However, the metrics indicate that this model is still a great fit.
SQL
negativePowers option enables you to fit Laurent polynomials. If you set this option to true, the model includes independent variables raised to negative powers. Sums of such terms are named Laurent polynomials. The model generates all possible terms such that the sum of the absolute value of the power of each term in each product is less than or equal to the order.
For example, with two independent variables and the order set to 2, the model is:
y = a1*x1^2 + a2*x1^-2 + a3*x2^2 + a4*x2^-2 + a5*x1*x2 + a6*x1^-1*x2 + a7*x1*x2^-1 + a8*x1^-1*x2^-1 a9*x1 + a10*x1^-1 + a11*x2 + a12*x2^-1 + b
SQL
pr2 model.
SQL
Linear Combination Regression
The most complex type of linear regression that Ocient supports is linear combination regression. This regression is a generalization of polynomial regression. Polynomial regression tries to find the best linear combination of polynomial terms (x1, x2,x1^2,x2^2,x1*x2, and so on). Linear combination regression finds the best fit linear combination, where you can specify all the terms.
Assume this training data. Use the CREATE TABLE AS SELECT SQL statement to create the mldemo.lcr table. Generate 1000 rows of data.
SQL
SQL
SQL
Non-Linear Regression
The non-linear regression model is similar to linear combination regression in that you define the form of the model. However, non-linear regression can be more than a linear combination of terms. Assume this training data. Use theCREATE TABLE AS SELECT SQL statement to create the mldemo.nlo_input1 table. Generate 100 rows of data. Then, create a non-linear regression model using five parameters.
SQL
machine_learning_models and nonlinear_regression_models system catalog tables to see the metrics and the values that the model chose.
SQL
abs(f-y) loss function.
SQL
f represents the value of the model, and the variable y represents the actual value in the training data.
The catalog indicates the model is nearly the same as the previous model. In other words, changing the loss function here did not substantially change anything. But there are cases where other loss functions are appropriate and they could substantially change the model.
Non-Linear Regression with Neural Networks
In all of the non-linear cases, you have to make a good guess for the form of the target function. However, with neural networks, you can leave it up to the network to try to figure out that form. The downside is that aspects of the model are a black box. This example uses the same example as non-linear regression, but instead of specifying the form of the function, it relies on what the neural net can do with little information. The size of the network layers and the number of layers greatly influence how good of a fit a neural net model has. This example uses a simple network.SQL
machine_learning_models and feedforward_network_models system catalog tables.
SQL
SQL
Vector-Valued Regression
Vector-valued regression is the case where the dependent variable has multiple components. You can use any type of model to learn each component separately, or you can use a feedforward network to learn all components in one model. The example creates 100 rows of sample data.SQL
x1, which is vector-valued. Each component can fit a linear model, which can then predict vector-valued results.
SQL
SQL
SQL
outputs, which is set to the size of the vector, and the lossFunction option is set to use vector_squared_error.
This example shows how well the model can predict the output.
SQL
Autoregression
Autoregression is linear regression where the independent variables are lags of the dependent variable. In other words, it is a series where the next value depends linearly on the previous N values. These examples do not use a new model type. Instead, they use multiple linear regression models. One of the important things about autoregression is that some variables indicate the order of the data. This variable can be a time variable, but it does not need to be. This variable is necessary to correctly set up the model, but it is not directly used by the model. Instead, the variable helps build the correct lags. This example builds a model to predict the Fibonacci sequence. This statement creates a table with the first several Fibonacci terms.SQL
SQL
55.
SQL
Vector Autoregression
Vector autoregression is just a more complex version of autoregression. This model uses multiple time series, where the next value for each time series depends linearly on the lags of all of the time series. This statement sets up some data where the next value for two time series depends on the previous two values of both time series.SQL
SQL
SQL
machine_learning_models and vector_autoregression_models system catalog tables.
SQL
SQL

