Machine Learning in Ocient

Whether you are training a model or using machine learning functionality to do scoring or prediction, 
﻿ enables machine learning in the SQL syntax directly.
You can use the CREATE MLMODEL command to create models. This step is for model training and is referred to as model creation. Similarly, you can use the model on new input data to generate new predictions by executing the scalar function that has the same name as the model. This action is typically referred to as executing the model.
Machine learning in 
﻿ depends upon linear algebra functionality that is built into the system. OcientML provides machine learning capabilities that you can invoke in SQL statements with some application logic. 
Linear Algebra in Ocient
Matrices are a first class data type in the Ocient system. Create a matrix using a simple SQL SELECT statement.
SQL
SELECT {{1, 2}, {3, 4}};
{{1,2},{3,4}}
--------------------------------------------------------------------------------
[[1.0, 2.0], [3.0, 4.0]]

Fetched 1 row

SELECT {{c1*1, c1*2}, {c1*3, c1*4}} FROM sys.dummy2;
make_matrix_2x2((2), (2), ((1))*(c1), ((2))*(c1), ((3))*(c1), ((4))*(c1))
--------------------------------------------------------------------------------
[[1.0, 2.0], [3.0, 4.0]]
[[2.0, 4.0], [6.0, 8.0]]

Fetched 2 rows
SELECT {{1, 2}, {3, 4}};
{{1,2},{3,4}}
--------------------------------------------------------------------------------
[[1.0, 2.0], [3.0, 4.0]]

Fetched 1 row

SELECT {{c1*1, c1*2}, {c1*3, c1*4}} FROM sys.dummy2;
make_matrix_2x2((2), (2), ((1))*(c1), ((2))*(c1), ((3))*(c1), ((4))*(c1))
--------------------------------------------------------------------------------
[[1.0, 2.0], [3.0, 4.0]]
[[2.0, 4.0], [6.0, 8.0]]

Fetched 2 rows
﻿
You can also use shorthand notations to create row   _r or column vectors _c. For example, _r{1,2,3} creates a row vector with values 1.0, 2.0, and 3.0.
SQL
SELECT _r{1,2,3};
_r{1,2,3}
--------------------------------------------------------------------------------
[[1.0, 2.0, 3.0]]

Fetched 1 row

SELECT _c{1,2,3};
_c{1,2,3}
--------------------------------------------------------------------------------
[[1.0], [2.0], [3.0]]

Fetched 1 row
SELECT _r{1,2,3};
_r{1,2,3}
--------------------------------------------------------------------------------
[[1.0, 2.0, 3.0]]

Fetched 1 row

SELECT _c{1,2,3};
_c{1,2,3}
--------------------------------------------------------------------------------
[[1.0], [2.0], [3.0]]

Fetched 1 row
﻿
You can use functions on values in the vectors. This query does some matrix math, finds the inverse matrix, and then returns the two eigenvalues and eigenvectors of the inverse.
SQL
SELECT EIGEN(INVERSE(2 * {{1,2},{3,4}} + {{5,6},{7,8}} / 2));
EIGEN(INVERSE((((2))*({{1,2},{3,4}}))+(({{5,6},{7,8}})/((2)))))
--------------------------------------------------------------------------------
[<<-1.3780529228406495, [[0.8013353799887076, -0.5982153531783962]]>>, <<0.05805292284064968, [[0.48196559267508465, 0.8761901434491]]>>]

Fetched 1 row
SELECT EIGEN(INVERSE(2 * {{1,2},{3,4}} + {{5,6},{7,8}} / 2));
EIGEN(INVERSE((((2))*({{1,2},{3,4}}))+(({{5,6},{7,8}})/((2)))))
--------------------------------------------------------------------------------
[<<-1.3780529228406495, [[0.8013353799887076, -0.5982153531783962]]>>, <<0.05805292284064968, [[0.48196559267508465, 0.8761901434491]]>>]

Fetched 1 row
﻿
Regression in Ocient
Regression is the process of finding a best fit function for a set of data. The simplest case is simple linear regression. Simple linear regression finds a best fit linear relationship between one variable and another.
Suppose you have this data with a bit of noise added.
SQL
CREATE TABLE mldemo.slr AS 
(SELECT c1 AS x, c1 * 3.14 - 1.41 + (RAND() - 0.5) * 0.01 AS y 
FROM sys.dummy100);
Modified 100 rows
CREATE TABLE mldemo.slr AS 
(SELECT c1 AS x, c1 * 3.14 - 1.41 + (RAND() - 0.5) * 0.01 AS y 
FROM sys.dummy100);
Modified 100 rows
﻿
Create a simple linear regression model against this data.
SQL
CREATE MLMODEL my_first_model TYPE SIMPLE LINEAR REGRESSION 
ON (SELECT x, y FROM mldemo.slr);
CREATE MLMODEL my_first_model TYPE SIMPLE LINEAR REGRESSION 
ON (SELECT x, y FROM mldemo.slr);
﻿
Explore the created model by viewing the system catalog.
SQL
SELECT name, on_select, slope, y_intercept 
FROM sys.machine_learning_models a, sys.simple_linear_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'my_first_model';
name                                         on_select                                    slope                 y_intercept
--------------------------------------------------------------------------------------------------------------------------------------
my_first_model                               select x, y from mldemo.slr                  3.139995450114637     -1.4102410466743238

Fetched 1 row
SELECT name, on_select, slope, y_intercept 
FROM sys.machine_learning_models a, sys.simple_linear_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'my_first_model';
name                                         on_select                                    slope                 y_intercept
--------------------------------------------------------------------------------------------------------------------------------------
my_first_model                               select x, y from mldemo.slr                  3.139995450114637     -1.4102410466743238

Fetched 1 row
﻿
You can see the calculated slope and the y-intercept from the model, and see that these values are very close to the values used to generate the data. While the input to the model creation in this case is very simple, the SQL statement in the ON clause of the model creation can be an arbitrarily complex SQL statement as long as it meets the requirements of the input for the specific model type being created.
Execute the created model by executing the SQL scalar function with the same name as the model you created. The input arguments of the function depend on the specific model type. For regression, the values are the independent variables and then the function returns the dependent variables.
SQL
SELECT x, y AS actual, my_first_model(x) AS predicted FROM mldemo.slr LIMIT 10;
x          actual                predicted
-------------------------------------------------------
51         158.730137908257      158.72952690917217
52         161.87052128169267    161.8695223592868
53         165.0119331158625     165.00951780940144
54         168.1478263920115     168.14951325951608
55         171.28758882396247    171.28950870963072
56         174.43476312079238    174.42950415974536
57         177.57385971080322    177.56949960986
58         180.70933136639127    180.70949505997464
59         183.84856961481378    183.84949051008925
60         186.99044288540566    186.9894859602039

Fetched 10 rows
SELECT x, y AS actual, my_first_model(x) AS predicted FROM mldemo.slr LIMIT 10;
x          actual                predicted
-------------------------------------------------------
51         158.730137908257      158.72952690917217
52         161.87052128169267    161.8695223592868
53         165.0119331158625     165.00951780940144
54         168.1478263920115     168.14951325951608
55         171.28758882396247    171.28950870963072
56         174.43476312079238    174.42950415974536
57         177.57385971080322    177.56949960986
58         180.70933136639127    180.70949505997464
59         183.84856961481378    183.84949051008925
60         186.99044288540566    186.9894859602039

Fetched 10 rows
﻿
The model makes predictions that are very close to the actual training data. Then, you can use this data to generate predictions for data that is unknown.
Machine Learning Model Options
You can provide options to most model types. Provide options by pairing the string for an option to its corresponding value. For example, simple linear regression models have a metrics option that you can use to ask the database to calculate some quality of fit metrics. Use the same model and enable metrics collection, which is off by default.
SQL
DROP MLMODEL my_first_model;
Modified 0 rows

CREATE MLMODEL my_first_model TYPE SIMPLE LINEAR REGRESSION 
ON (SELECT x, y FROM mldemo.slr) options('metrics' -> 'true');
Modified 0 rows
DROP MLMODEL my_first_model;
Modified 0 rows

CREATE MLMODEL my_first_model TYPE SIMPLE LINEAR REGRESSION 
ON (SELECT x, y FROM mldemo.slr) options('metrics' -> 'true');
Modified 0 rows
﻿
Examine the system catalog tables for more fields.
SQL
SELECT name, on_select, slope, y_intercept, coefficient_of_determination, 
adjusted_r2, rmse FROM sys.machine_learning_models a, sys.simple_linear_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'my_first_model';
name                                         on_select                                    slope                 y_intercept           coefficient_of_determination adjusted_r2           rmse
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
my_first_model                               select x, y from mldemo.slr                  3.139995450114637     -1.4102410466743238   0.9999999990764926           0.999999999067069     2.7544625001417E-4

Fetched 1 row
SELECT name, on_select, slope, y_intercept, coefficient_of_determination, 
adjusted_r2, rmse FROM sys.machine_learning_models a, sys.simple_linear_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'my_first_model';
name                                         on_select                                    slope                 y_intercept           coefficient_of_determination adjusted_r2           rmse
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
my_first_model                               select x, y from mldemo.slr                  3.139995450114637     -1.4102410466743238   0.9999999990764926           0.999999999067069     2.7544625001417E-4

Fetched 1 row
﻿
The database also calculates the coefficient of determination (frequently called r2), the adjusted r2 value, and the RMSE (root mean squared error). All of these values indicate that this model has an extremely good fit.
In addition, you can force a specific y-intercept, such that the value of f(0) is fixed. Use the yIntercept option. Create a model on the same data but force the y-intercept to be zero. The metrics show that the model is not a good fit.
SQL
CREATE MLMODEL force_zero TYPE SIMPLE LINEAR REGRESSION 
ON (SELECT x, y FROM mldemo.slr) options('metrics' -> 'true', 'yIntercept' -> '0');
Modified 0 rows

SELECT name, on_select, slope, y_intercept, coefficient_of_determination, 
adjusted_r2, rmse FROM sys.machine_learning_models a, sys.simple_linear_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'force_zero';
name                                         on_select                                    slope                 y_intercept           coefficient_of_determination adjusted_r2           rmse
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
force_zero                                   select x, y from mldemo.slr                  3.1189470762836766    0.0                   0.9999403831244921           0.9999397747890277    0.06998440678511778

Fetched 1 row
CREATE MLMODEL force_zero TYPE SIMPLE LINEAR REGRESSION 
ON (SELECT x, y FROM mldemo.slr) options('metrics' -> 'true', 'yIntercept' -> '0');
Modified 0 rows

SELECT name, on_select, slope, y_intercept, coefficient_of_determination, 
adjusted_r2, rmse FROM sys.machine_learning_models a, sys.simple_linear_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'force_zero';
name                                         on_select                                    slope                 y_intercept           coefficient_of_determination adjusted_r2           rmse
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
force_zero                                   select x, y from mldemo.slr                  3.1189470762836766    0.0                   0.9999403831244921           0.9999397747890277    0.06998440678511778

Fetched 1 row
﻿
Multiple Linear Regression
In multiple linear regression, the model is still predicting an output value. But instead of handling one input variable, the model can handle an arbitrary number of input variables. Multiple linear regression finds the best fit model of the form:
y = a1x1 + a2x2 + a3x3 + … + a0
Simple linear regression and multiple linear regression (as well as polynomial regression and linear combination regression) all use a least-squares loss function. Some of the more advanced models allow you to define best fit yourself if the default definition is not satisfactory. Least-squares best fit means that the model minimizes the sum (over all training data rows) of the square of the model error.
Create a training data set with some noise and show that the model can reconstruct the data. Multiple linear regression models also support a metrics option (the metrics are the same as simple linear regression). 
SQL
CREATE TABLE mldemo.mlr AS (SELECT a.c1 AS x1, b.c1 AS x2, 1 + 2*a.c1 + 3*b.c1 AS y 
FROM sys.dummy10 a, sys.dummy10 b);
Modified 100 rows

CREATE MLMODEL mlr_model TYPE MULTIPLE LINEAR REGRESSION ON (SELECT * FROM mldemo.mlr) 
options('metrics' -> 'true');
Modified 0 rows

SELECT x1, x2, y AS actual, mlr_model(x1, x2) AS predicted FROM mldemo.mlr LIMIT 10;
x1         x2         actual              predicted
----------------------------------------------------------------
6          1          16                  15.999999999999975
6          2          19                  18.999999999999975
6          3          22                  21.999999999999975
6          4          25                  24.999999999999975
6          5          28                  27.999999999999975
6          6          31                  30.999999999999975
6          7          34                  33.99999999999997
6          8          37                  36.99999999999997
6          9          40                  39.99999999999997
6          10         43                  42.99999999999997

Fetched 10 rows
CREATE TABLE mldemo.mlr AS (SELECT a.c1 AS x1, b.c1 AS x2, 1 + 2*a.c1 + 3*b.c1 AS y 
FROM sys.dummy10 a, sys.dummy10 b);
Modified 100 rows

CREATE MLMODEL mlr_model TYPE MULTIPLE LINEAR REGRESSION ON (SELECT * FROM mldemo.mlr) 
options('metrics' -> 'true');
Modified 0 rows

SELECT x1, x2, y AS actual, mlr_model(x1, x2) AS predicted FROM mldemo.mlr LIMIT 10;
x1         x2         actual              predicted
----------------------------------------------------------------
6          1          16                  15.999999999999975
6          2          19                  18.999999999999975
6          3          22                  21.999999999999975
6          4          25                  24.999999999999975
6          5          28                  27.999999999999975
6          6          31                  30.999999999999975
6          7          34                  33.99999999999997
6          8          37                  36.99999999999997
6          9          40                  39.99999999999997
6          10         43                  42.99999999999997

Fetched 10 rows
﻿
Polynomial Regression
With polynomial regression, the database finds a least-squares best fit polynomial of whatever degree you specify. Same as multiple linear regression, you determine the number of independent variables. 
The training data has a 1/(x1+x2) term that the model does not match, but that term quickly tends towards zero, so the model should provide a good fit with a quadratic polynomial of x1 and x2. Some of the coefficients match the data but some do not because the model is compensating for not being able to fit the form of the data exactly. However, the metrics indicate that this model is still a great fit.
SQL
CREATE TABLE mldemo.pr AS 
(SELECT x1, x2, 1.0 / (x1 * x2) - 2*x1 - 0.5*x1*x2 + 2.3*x1^2 + 1.1*x2^2 AS y 
FROM (SELECT a.c1 AS x1, b.c1 AS x2 FROM sys.dummy10 a, sys.dummy10 b));
Modified 100 rows

CREATE MLMODEL pr1 TYPE POLYNOMIAL REGRESSION 
ON (SELECT * FROM mldemo.pr) options('order' -> '2', 'metrics' -> 'true');
Modified 0 rows

SELECT name, on_select, num_arguments, coefficient_of_determination, 
adjusted_r2, rmse, y_intercept, coefficients 
FROM sys.machine_learning_models a, sys.polynomial_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'pr1';
name                                         on_select                                    num_arguments coefficient_of_determination adjusted_r2           rmse                  y_intercept           coefficients
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pr1                                          select * from mldemo.pr                      2             0.9999991319879137           0.9999991140907573    0.06288797037552916   0.720605599825376     [-0.10943699031755831, 1.105235143324365, -2.1094369903178176, -0.4945162377513963, 2.3052351433243388]

Fetched 1 row
CREATE TABLE mldemo.pr AS 
(SELECT x1, x2, 1.0 / (x1 * x2) - 2*x1 - 0.5*x1*x2 + 2.3*x1^2 + 1.1*x2^2 AS y 
FROM (SELECT a.c1 AS x1, b.c1 AS x2 FROM sys.dummy10 a, sys.dummy10 b));
Modified 100 rows

CREATE MLMODEL pr1 TYPE POLYNOMIAL REGRESSION 
ON (SELECT * FROM mldemo.pr) options('order' -> '2', 'metrics' -> 'true');
Modified 0 rows

SELECT name, on_select, num_arguments, coefficient_of_determination, 
adjusted_r2, rmse, y_intercept, coefficients 
FROM sys.machine_learning_models a, sys.polynomial_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'pr1';
name                                         on_select                                    num_arguments coefficient_of_determination adjusted_r2           rmse                  y_intercept           coefficients
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pr1                                          select * from mldemo.pr                      2             0.9999991319879137           0.9999991140907573    0.06288797037552916   0.720605599825376     [-0.10943699031755831, 1.105235143324365, -2.1094369903178176, -0.4945162377513963, 2.3052351433243388]

Fetched 1 row
﻿
Polynomial regression models have a lot of available options, including everything that is available for multiple linear regression. The negativePowers option enables you to fit Laurent polynomials. If you set this option to true, the model includes independent variables raised to negative powers. These variables are called Laurent polynomials. The model generates all possible terms such that the sum of the absolute value of the power of each term in each product is less than or equal to the order.
For example, with two independent variables and the order set to 2, the model is: 
y = a1*x1^2 + a2*x1^-2 + a3*x2^2 + a4*x2^-2 + a5*x1*x2 + a6*x1^-1*x2 + a7*x1*x2^-1 + a8*x1-1*x2-1 + a9*x1 + a10*x1^-1 + a11*x2 + a12*x2^-1 + b
SQL
CREATE TABLE mldemo.pr_negativepowers AS 
(SELECT x1, x2, 1.0 + 12*x2^-1 - 11*x2 + 10*x1^-1 + 9*x1 + 8*(x1 - 1)*(x2 - 1) + 
7*x1*x2^-1 + 6*x1^-1*x2 + 5*x1*x2 + 4*x2^-2 + 3*x2^2 + 2*x1^-2 + x1^2 AS y 
FROM (SELECT a.c1 AS x1, b.c1 AS x2 FROM sys.dummy10 a, sys.dummy10 b));
Modified 100 rows

CREATE MLMODEL pr2 TYPE POLYNOMIAL REGRESSION 
ON (SELECT * FROM mldemo.pr_negativepowers) 
options('order' -> '2', 'metrics' -> 'true', 'negativePowers' -> 'true');
Modified 0 rows

SELECT name, on_select, num_arguments, coefficient_of_determination, 
adjusted_r2, rmse, y_intercept, coefficients 
FROM sys.machine_learning_models a, sys.polynomial_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'pr2';
name                                         on_select                                    num_arguments coefficient_of_determination adjusted_r2           rmse                  y_intercept           coefficients
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pr2                                          select * from mldemo.pr                      2             1.0                          1.0                   1.4203541595255201E-10 -3.2741809263825417E-10 [1.378090447893218E-11, 7.594146281547169E-11, 1.0999999999996046, -5.800996726877696E-11, -1.9999999999236024, -0.5000000000004785, -5.027050118995952E-12, 6.257323548197746E-10, -3.953341155821705E-12, 0.9999999999510328, 2.2999999999966576, -3.637978807091713E-10]
CREATE TABLE mldemo.pr_negativepowers AS 
(SELECT x1, x2, 1.0 + 12*x2^-1 - 11*x2 + 10*x1^-1 + 9*x1 + 8*(x1 - 1)*(x2 - 1) + 
7*x1*x2^-1 + 6*x1^-1*x2 + 5*x1*x2 + 4*x2^-2 + 3*x2^2 + 2*x1^-2 + x1^2 AS y 
FROM (SELECT a.c1 AS x1, b.c1 AS x2 FROM sys.dummy10 a, sys.dummy10 b));
Modified 100 rows

CREATE MLMODEL pr2 TYPE POLYNOMIAL REGRESSION 
ON (SELECT * FROM mldemo.pr_negativepowers) 
options('order' -> '2', 'metrics' -> 'true', 'negativePowers' -> 'true');
Modified 0 rows

SELECT name, on_select, num_arguments, coefficient_of_determination, 
adjusted_r2, rmse, y_intercept, coefficients 
FROM sys.machine_learning_models a, sys.polynomial_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'pr2';
name                                         on_select                                    num_arguments coefficient_of_determination adjusted_r2           rmse                  y_intercept           coefficients
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pr2                                          select * from mldemo.pr                      2             1.0                          1.0                   1.4203541595255201E-10 -3.2741809263825417E-10 [1.378090447893218E-11, 7.594146281547169E-11, 1.0999999999996046, -5.800996726877696E-11, -1.9999999999236024, -0.5000000000004785, -5.027050118995952E-12, 6.257323548197746E-10, -3.953341155821705E-12, 0.9999999999510328, 2.2999999999966576, -3.637978807091713E-10]
﻿
You can see that now the model almost perfectly recovers the coefficients, with the rest being nearly zero.
Execute the pr2 model.
SQL
SELECT x1, x2, y AS actual, pr2(x1, x2) AS predicted FROM mldemo.pr LIMIT 10;
x1                    x2                    actual                predicted
----------------------------------------------------------------------------------------
6.0                   1.0                   69.06666666666666     69.06666666676095
6.0                   2.0                   69.28333333333333     69.28333333346139
6.0                   3.0                   71.75555555555556     71.75555555569368
6.0                   4.0                   76.44166666666666     76.44166666681195
6.0                   5.0                   83.33333333333333     83.33333333348473
6.0                   6.0                   92.42777777777778     92.42777777793454
6.0                   7.0                   103.72380952380954    103.72380952397093
6.0                   8.0                   117.22083333333333    117.22083333349865
6.0                   9.0                   132.9185185185185     132.91851851868702
6.0                   10.0                  150.81666666666666    150.81666666683765

Fetched 10 rows
SELECT x1, x2, y AS actual, pr2(x1, x2) AS predicted FROM mldemo.pr LIMIT 10;
x1                    x2                    actual                predicted
----------------------------------------------------------------------------------------
6.0                   1.0                   69.06666666666666     69.06666666676095
6.0                   2.0                   69.28333333333333     69.28333333346139
6.0                   3.0                   71.75555555555556     71.75555555569368
6.0                   4.0                   76.44166666666666     76.44166666681195
6.0                   5.0                   83.33333333333333     83.33333333348473
6.0                   6.0                   92.42777777777778     92.42777777793454
6.0                   7.0                   103.72380952380954    103.72380952397093
6.0                   8.0                   117.22083333333333    117.22083333349865
6.0                   9.0                   132.9185185185185     132.91851851868702
6.0                   10.0                  150.81666666666666    150.81666666683765

Fetched 10 rows
﻿
Linear Combination Regression
The most complex type of linear regression that Ocient supports is linear combination regression. This regression is a generalization of polynomial regression. Polynomial regression tries to find the best linear combination of polynomial terms (x1, x2,x1^2,x2^2,x1*x2, and so on). Linear combination regression finds the best fit linear combination where you can specify all the terms. 
Assume this training data.
SQL
CREATE TABLE mldemo.lcr AS 
(SELECT x1, x2, x3, 10.0 * sin(x1) + 2.0 * cos(x2) - 3.0 * MOD(INT(x1)*INT(x2)*INT(x3), 4) 
AS y FROM (SELECT a.c1 AS x1, b.c1 AS x2, c.c1 AS x3 
FROM sys.dummy10 a, sys.dummy10 b, sys.dummy10 c));
Modified 1000 rows
CREATE TABLE mldemo.lcr AS 
(SELECT x1, x2, x3, 10.0 * sin(x1) + 2.0 * cos(x2) - 3.0 * MOD(INT(x1)*INT(x2)*INT(x3), 4) 
AS y FROM (SELECT a.c1 AS x1, b.c1 AS x2, c.c1 AS x3 
FROM sys.dummy10 a, sys.dummy10 b, sys.dummy10 c));
Modified 1000 rows
﻿
Fit this data with some kind of higher degree polynomial model.
SQL
CREATE MLMODEL pr_attempting_lcr TYPE POLYNOMIAL REGRESSION 
ON (SELECT * FROM mldemo.lcr) options('order' -> '5', 'metrics' -> 'true');
Modified 0 rows

SELECT name, on_select, num_arguments, coefficient_of_determination, 
adjusted_r2, rmse, y_intercept, coefficients 
FROM sys.machine_learning_models a, sys.polynomial_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'pr_attempting_lcr';
name                                         on_select                                    num_arguments coefficient_of_determination adjusted_r2           rmse                  y_intercept           coefficients
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pr_attempting_lcr                            select * from mldemo.lcr                     3             0.8319917524743956           0.8314857035360654    3.096446300059235     -55.14103736166726    [8.606630821262804, -3.232482517584192, 0.5805449365661026, -0.049741258743219, 0.0016307692308343673, 8.399340340347107, -0.5287664524114927, 0.08856961220855254, -0.0064693555605100815, 1.7800381437636605E-4, -5.4034476625452825, 0.08856961220520951, -0.008490499399775638, 2.684184502433214E-4, 1.4579535930358838, -0.006469355560037162, 2.68418450241128E-4, -0.16482119055274744, 1.780038143557162E-4, 0.006512838626194554, 84.78854869028874, -0.5287664524188649, 0.08856961220814323, -0.006469355560487793, 1.780038143772283E-4, -0.528766452427062, -0.01649034528388861, 0.0012223956025726305, -5.6680387259322435E-5, 0.08856961220929233, 0.0012223956025593338, -1.6695884464630135E-5, -0.006469355560584848, -5.6680387258156464E-5, 1.78003814376761E-4, -44.36009386495971, 0.08856961220998283, -0.008490499399632612, 2.6841845023805184E-4, 0.08856961221025506, 0.0012223956025763133, -1.669588446505692E-5, -0.008490499399797105, -1.6695884465316113E-5, 2.6841845024373886E-4, 9.206023264675196, -0.006469355560753055, 2.6841845023840556E-4, -0.006469355560705054, -5.6680387258737325E-5, 2.6841845024255106E-4, -0.8194647667934305, 1.7800381438661845E-4, 1.7800381438500682E-4, 0.026149389153476754]

Fetched 1 row
CREATE MLMODEL pr_attempting_lcr TYPE POLYNOMIAL REGRESSION 
ON (SELECT * FROM mldemo.lcr) options('order' -> '5', 'metrics' -> 'true');
Modified 0 rows

SELECT name, on_select, num_arguments, coefficient_of_determination, 
adjusted_r2, rmse, y_intercept, coefficients 
FROM sys.machine_learning_models a, sys.polynomial_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'pr_attempting_lcr';
name                                         on_select                                    num_arguments coefficient_of_determination adjusted_r2           rmse                  y_intercept           coefficients
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pr_attempting_lcr                            select * from mldemo.lcr                     3             0.8319917524743956           0.8314857035360654    3.096446300059235     -55.14103736166726    [8.606630821262804, -3.232482517584192, 0.5805449365661026, -0.049741258743219, 0.0016307692308343673, 8.399340340347107, -0.5287664524114927, 0.08856961220855254, -0.0064693555605100815, 1.7800381437636605E-4, -5.4034476625452825, 0.08856961220520951, -0.008490499399775638, 2.684184502433214E-4, 1.4579535930358838, -0.006469355560037162, 2.68418450241128E-4, -0.16482119055274744, 1.780038143557162E-4, 0.006512838626194554, 84.78854869028874, -0.5287664524188649, 0.08856961220814323, -0.006469355560487793, 1.780038143772283E-4, -0.528766452427062, -0.01649034528388861, 0.0012223956025726305, -5.6680387259322435E-5, 0.08856961220929233, 0.0012223956025593338, -1.6695884464630135E-5, -0.006469355560584848, -5.6680387258156464E-5, 1.78003814376761E-4, -44.36009386495971, 0.08856961220998283, -0.008490499399632612, 2.6841845023805184E-4, 0.08856961221025506, 0.0012223956025763133, -1.669588446505692E-5, -0.008490499399797105, -1.6695884465316113E-5, 2.6841845024373886E-4, 9.206023264675196, -0.006469355560753055, 2.6841845023840556E-4, -0.006469355560705054, -5.6680387258737325E-5, 2.6841845024255106E-4, -0.8194647667934305, 1.7800381438661845E-4, 1.7800381438500682E-4, 0.026149389153476754]

Fetched 1 row
﻿
The coefficient of determination shows that the model is not a bad fit, but the fit is not great due to the structure of the data. However, given that the model is still a linear combination of functions, linear combination regression can recover the fit if you specify the functions to use.
SQL
CREATE MLMODEL lcr1 TYPE LINEAR COMBINATION REGRESSION 
ON (SELECT x1, x2, x3, y FROM mldemo.lcr) options('function1' -> 'sin(x1)', 
'function2' -> 'cos(x2)', 'function3' -> 'mod(int(x1)*int(x2)*int(x3),4)', 
'metrics' -> 'true');
Modified 0 rows

SELECT name, on_select, num_arguments, coefficient_of_determination, 
adjusted_r2, rmse, y_intercept, coefficients 
FROM sys.machine_learning_models a, sys.linear_combination_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'lcr1';
name                                         on_select                                    num_arguments coefficient_of_determination adjusted_r2           rmse                  y_intercept           coefficients
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
lcr1                                         select x1, x2, x3, y from mldemo.lcr         3             1.0                          1.0                   3.597211844847171E-14 3.9968028886505635E-15 [9.99999999999995, 2.000000000000012, -2.999999999999997]

Fetched 1 row
CREATE MLMODEL lcr1 TYPE LINEAR COMBINATION REGRESSION 
ON (SELECT x1, x2, x3, y FROM mldemo.lcr) options('function1' -> 'sin(x1)', 
'function2' -> 'cos(x2)', 'function3' -> 'mod(int(x1)*int(x2)*int(x3),4)', 
'metrics' -> 'true');
Modified 0 rows

SELECT name, on_select, num_arguments, coefficient_of_determination, 
adjusted_r2, rmse, y_intercept, coefficients 
FROM sys.machine_learning_models a, sys.linear_combination_regression_models b 
WHERE a.id = b.machine_learning_model_id AND name = 'lcr1';
name                                         on_select                                    num_arguments coefficient_of_determination adjusted_r2           rmse                  y_intercept           coefficients
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
lcr1                                         select x1, x2, x3, y from mldemo.lcr         3             1.0                          1.0                   3.597211844847171E-14 3.9968028886505635E-15 [9.99999999999995, 2.000000000000012, -2.999999999999997]

Fetched 1 row
﻿
You can see that the model finds the coefficients perfectly and has a coefficient of determination of 1.0.
The coefficient of determination represents the amount of variance in the data that is explained by the model. A coefficient of determination of 1.0 indicates that all the variance is explained, and that the model is a perfect fit. A coefficient of determination of 0.0 indicates that none of the variance is explained. A negative coefficient of determination is possible, but you do not find it in linear regression models because these models always find a fit that is at least as good as the constant function (the average of all the training data). That model has a coefficient of determination of 0.0.
Related Links
﻿Machine Learning Model Functions﻿ 
﻿Regression Models﻿ 
Updated 11 Jan 2024
Did this page help you?
Machine Learning Model Functions
Regression Models