Analysis in Ocient
Machine Learning in Ocient
Regression Analysis
{{ocientml}} enables regression analysis, which is the process of finding a best fit function for a data set, in the {{ocient}} system simple linear regression the simplest case is simple linear regression simple linear regression finds a best fit linear relationship between one variable and another suppose you have this data with a bit of noise added use the create table as select sql statement to create the mldemo slr table generate 100 rows of random data create table mldemo slr as (select c1 as x, c1 3 14 1 41 + (rand() 0 5) 0 01 as y from sys dummy100); modified 100 rows create a simple linear regression model using this data create mlmodel my first model type simple linear regression on (select x, y from mldemo slr); explore the created model by viewing the data in the machine learning models and simple linear regression models system catalog tables select name, on select, slope, y intercept from sys machine learning models a, sys simple linear regression models b where a id = b machine learning model id and name = 'my first model'; name on select slope y intercept \ my first model select x, y from mldemo slr 3 139995450114637 1 4102410466743238 fetched 1 row you can see the calculated slope and the y intercept from the model, and see that these values are very close to the values used to generate the data while the input to the model creation, in this case, is very simple, the sql statement in the on clause of the model creation can be arbitrarily complex as long as it meets the requirements of the input for the specific model type being created execute the created model by executing the sql scalar function with the same name as the model you created the input arguments of the function depend on the specific model type for regression, the values are the independent variables, and then the function returns the dependent variables select x, y as actual, my first model(x) as predicted from mldemo slr limit 10; x actual predicted \ 51 158 730137908257 158 72952690917217 52 161 87052128169267 161 8695223592868 53 165 0119331158625 165 00951780940144 54 168 1478263920115 168 14951325951608 55 171 28758882396247 171 28950870963072 56 174 43476312079238 174 42950415974536 57 177 57385971080322 177 56949960986 58 180 70933136639127 180 70949505997464 59 183 84856961481378 183 84949051008925 60 186 99044288540566 186 9894859602039 fetched 10 rows the model makes predictions that are very close to the actual training data then, you can use this data to generate predictions for data that is unknown machine learning model options you can provide options for most model types provide options by pairing the string for an option to its corresponding value for example, simple linear regression models have a metrics option that you can use to ask the database to calculate some quality of fit metrics use the same model and enable metrics collection, which is off by default drop mlmodel my first model; modified 0 rows create mlmodel my first model type simple linear regression on (select x, y from mldemo slr) options('metrics' > 'true'); modified 0 rows examine the machine learning models and simple linear regression models system catalog tables for more fields select name, on select, slope, y intercept, coefficient of determination, adjusted r2, rmse from sys machine learning models a, sys simple linear regression models b where a id = b machine learning model id and name = 'my first model'; name on select slope y intercept coefficient of determination adjusted r2 rmse \ my first model select x, y from mldemo slr 3 139995450114637 1 4102410466743238 0 9999999990764926 0 999999999067069 2 7544625001417e 4 fetched 1 row the database also calculates the coefficient of determination (frequently called r2), the adjusted r2 value, and the rmse (root mean squared error) all of these values indicate that this model has an extremely good fit in addition, you can force a specific y intercept such that the value of f(0) is fixed use the yintercept option create a model on the same data but force the y intercept to be zero the metrics show that the model is not as good a fit as when the intercept is free to vary create mlmodel force zero type simple linear regression on (select x, y from mldemo slr) options('metrics' > 'true', 'yintercept' > '0'); modified 0 rows select name, on select, slope, y intercept, coefficient of determination, adjusted r2, rmse from sys machine learning models a, sys simple linear regression models b where a id = b machine learning model id and name = 'force zero'; name on select slope y intercept coefficient of determination adjusted r2 rmse \ force zero select x, y from mldemo slr 3 1189470762836766 0 0 0 9999403831244921 0 9999397747890277 0 06998440678511778 fetched 1 row multiple linear regression in multiple linear regression, the model is still predicting an output value but instead of handling one input variable, the model can handle an arbitrary number of input variables multiple linear regression finds the best fit model of the form y = a1x1 + a2x2 + a3x3 + … + a0 simple linear regression and multiple linear regression (as well as polynomial regression and linear combination regression) all use a least squares loss function some of the more advanced models allow you to define best fit yourself if the default definition is not satisfactory least squares best fit means that the model minimizes the sum (over all training data rows) of the square of the model error create a training data set with some noise and show that the model can reconstruct the data multiple linear regression models also support a metrics option (the metrics are the same as simple linear regression) create table mldemo mlr as (select a c1 as x1, b c1 as x2, 1 + 2 a c1 + 3 b c1 as y from sys dummy10 a, sys dummy10 b); modified 100 rows create mlmodel mlr model type multiple linear regression on (select from mldemo mlr) options('metrics' > 'true'); modified 0 rows select x1, x2, y as actual, mlr model(x1, x2) as predicted from mldemo mlr limit 10; x1 x2 actual predicted \ 6 1 16 15 999999999999975 6 2 19 18 999999999999975 6 3 22 21 999999999999975 6 4 25 24 999999999999975 6 5 28 27 999999999999975 6 6 31 30 999999999999975 6 7 34 33 99999999999997 6 8 37 36 99999999999997 6 9 40 39 99999999999997 6 10 43 42 99999999999997 fetched 10 rows polynomial regression with polynomial regression, the database finds a least squares best fit polynomial of whatever degree you specify same as multiple linear regression, you determine the number of independent variables the training data has a 1/(x1 x2) term that the model does not match, but that term quickly tends towards zero, so the model should provide a good fit with a quadratic polynomial of x1 and x2 some of the coefficients match the data but some do not because the model is compensating for not being able to fit the form of the data exactly however, the metrics indicate that this model is still a great fit create table mldemo pr as (select x1, x2, 1 0 / (x1 x2) 2 x1 0 5 x1 x2 + 2 3 x1^2 + 1 1 x2^2 as y from (select a c1 as x1, b c1 as x2 from sys dummy10 a, sys dummy10 b)); modified 100 rows create mlmodel pr1 type polynomial regression on (select from mldemo pr) options('order' > '2', 'metrics' > 'true'); modified 0 rows select name, on select, num arguments, coefficient of determination, adjusted r2, rmse, y intercept, coefficients from sys machine learning models a, sys polynomial regression models b where a id = b machine learning model id and name = 'pr1'; name on select num arguments coefficient of determination adjusted r2 rmse y intercept coefficients \ pr1 select from mldemo pr 2 0 9999991319879137 0 9999991140907573 0 06288797037552916 0 720605599825376 \[ 0 10943699031755831, 1 105235143324365, 2 1094369903178176, 0 4945162377513963, 2 3052351433243388] fetched 1 row polynomial regression models have many available options, including everything available for multiple linear regression the negativepowers option enables you to fit laurent polynomials if you set this option to true , the model includes independent variables raised to negative powers sums of such terms are named laurent polynomials the model generates all possible terms such that the sum of the absolute value of the power of each term in each product is less than or equal to the order for example, with two independent variables and the order set to 2, the model is y = a1 x1^2 + a2 x1^ 2 + a3 x2^2 + a4 x2^ 2 + a5 x1 x2 + a6 x1^ 1 x2 + a7 x1 x2^ 1 + a8 x1^ 1 x2^ 1 a9 x1 + a10 x1^ 1 + a11 x2 + a12 x2^ 1 + b create table mldemo pr negativepowers as (select x1, x2, 1 0 + 12 x2^ 1 11 x2 + 10 x1^ 1 + 9 x1 + 8 (x1 1) (x2 1) + 7 x1 x2^ 1 + 6 x1^ 1 x2 + 5 x1 x2 + 4 x2^ 2 + 3 x2^2 + 2 x1^ 2 + x1^2 as y from (select a c1 as x1, b c1 as x2 from sys dummy10 a, sys dummy10 b)); modified 100 rows create mlmodel pr2 type polynomial regression on (select from mldemo pr negativepowers) options('order' > '2', 'metrics' > 'true', 'negativepowers' > 'true'); modified 0 rows select name, on select, num arguments, coefficient of determination, adjusted r2, rmse, y intercept, coefficients from sys machine learning models a, sys polynomial regression models b where a id = b machine learning model id and name = 'pr2'; name on select num arguments coefficient of determination adjusted r2 rmse y intercept coefficients \ pr2 select from mldemo pr 2 1 0 1 0 1 4203541595255201e 10 3 2741809263825417e 10 \[1 378090447893218e 11, 7 594146281547169e 11, 1 0999999999996046, 5 800996726877696e 11, 1 9999999999236024, 0 5000000000004785, 5 027050118995952e 12, 6 257323548197746e 10, 3 953341155821705e 12, 0 9999999999510328, 2 2999999999966576, 3 637978807091713e 10] you can see that now the model almost perfectly recovers the coefficients, with the rest being nearly zero execute the pr2 model select x1, x2, y as actual, pr2(x1, x2) as predicted from mldemo pr limit 10; x1 x2 actual predicted \ 6 0 1 0 69 06666666666666 69 06666666676095 6 0 2 0 69 28333333333333 69 28333333346139 6 0 3 0 71 75555555555556 71 75555555569368 6 0 4 0 76 44166666666666 76 44166666681195 6 0 5 0 83 33333333333333 83 33333333348473 6 0 6 0 92 42777777777778 92 42777777793454 6 0 7 0 103 72380952380954 103 72380952397093 6 0 8 0 117 22083333333333 117 22083333349865 6 0 9 0 132 9185185185185 132 91851851868702 6 0 10 0 150 81666666666666 150 81666666683765 fetched 10 rows linear combination regression the most complex type of linear regression that ocient supports is linear combination regression this regression is a generalization of polynomial regression polynomial regression tries to find the best linear combination of polynomial terms ( x1 , x2 , x1^2 , x2^2 , x1 x2 , and so on) linear combination regression finds the best fit linear combination where you can specify all the terms assume this training data use the create table as select sql statement to create the mldemo lcr table generate 1000 rows of data create table mldemo lcr as (select x1, x2, x3, 10 0 sin(x1) + 2 0 cos(x2) 3 0 mod(int(x1) int(x2) int(x3), 4) as y from (select a c1 as x1, b c1 as x2, c c1 as x3 from sys dummy10 a, sys dummy10 b, sys dummy10 c)); modified 1000 rows fit this data using a fifth degree polynomial model create mlmodel pr attempting lcr type polynomial regression on (select from mldemo lcr) options('order' > '5', 'metrics' > 'true'); modified 0 rows select name, on select, num arguments, coefficient of determination, adjusted r2, rmse, y intercept, coefficients from sys machine learning models a, sys polynomial regression models b where a id = b machine learning model id and name = 'pr attempting lcr'; name on select num arguments coefficient of determination adjusted r2 rmse y intercept coefficients \ pr attempting lcr select from mldemo lcr 3 0 8319917524743956 0 8314857035360654 3 096446300059235 55 14103736166726 \[8 606630821262804, 3 232482517584192, 0 5805449365661026, 0 049741258743219, 0 0016307692308343673, 8 399340340347107, 0 5287664524114927, 0 08856961220855254, 0 0064693555605100815, 1 7800381437636605e 4, 5 4034476625452825, 0 08856961220520951, 0 008490499399775638, 2 684184502433214e 4, 1 4579535930358838, 0 006469355560037162, 2 68418450241128e 4, 0 16482119055274744, 1 780038143557162e 4, 0 006512838626194554, 84 78854869028874, 0 5287664524188649, 0 08856961220814323, 0 006469355560487793, 1 780038143772283e 4, 0 528766452427062, 0 01649034528388861, 0 0012223956025726305, 5 6680387259322435e 5, 0 08856961220929233, 0 0012223956025593338, 1 6695884464630135e 5, 0 006469355560584848, 5 6680387258156464e 5, 1 78003814376761e 4, 44 36009386495971, 0 08856961220998283, 0 008490499399632612, 2 6841845023805184e 4, 0 08856961221025506, 0 0012223956025763133, 1 669588446505692e 5, 0 008490499399797105, 1 6695884465316113e 5, 2 6841845024373886e 4, 9 206023264675196, 0 006469355560753055, 2 6841845023840556e 4, 0 006469355560705054, 5 6680387258737325e 5, 2 6841845024255106e 4, 0 8194647667934305, 1 7800381438661845e 4, 1 7800381438500682e 4, 0 026149389153476754] fetched 1 row the coefficient of determination shows that the model is not a bad fit, but the fit is not great due to the structure of the data however, given that the model is still a linear combination of functions, linear combination regression can recover the fit if you specify the functions to use create mlmodel lcr1 type linear combination regression on (select x1, x2, x3, y from mldemo lcr) options('function1' > 'sin(x1)', 'function2' > 'cos(x2)', 'function3' > 'mod(int(x1) int(x2) int(x3),4)', 'metrics' > 'true'); modified 0 rows select name, on select, num arguments, coefficient of determination, adjusted r2, rmse, y intercept, coefficients from sys machine learning models a, sys linear combination regression models b where a id = b machine learning model id and name = 'lcr1'; name on select num arguments coefficient of determination adjusted r2 rmse y intercept coefficients \ lcr1 select x1, x2, x3, y from mldemo lcr 3 1 0 1 0 3 597211844847171e 14 3 9968028886505635e 15 \[9 99999999999995, 2 000000000000012, 2 999999999999997] fetched 1 row you can see that the model finds the coefficients perfectly and has a coefficient of determination of 1 0 the coefficient of determination represents the amount of variance in the data explained by the model a coefficient of determination of 1 0 indicates that all the variance is explained and that the model is a perfect fit a coefficient of determination of 0 0 indicates that none of the variance is explained a negative coefficient of determination is possible, but you do not find it in linear regression models because these models always find a fit that is at least as good as the constant function (the average of all the training data) that model has a coefficient of determination of 0 0 non linear regression the non linear regression model is similar to linear combination regression in that you define the form of the model however, non linear regression can be more than a linear combination of terms assume this training data use the create table as select sql statement to create the mldemo nlo input1 table generate 100 rows of data then, create a non linear regression model using five parameters create table mldemo nlo input1 as ( select x1, x2, 2 0 sin(3 0 x1 1 0) 4 0 + 5 0 x2 as y from ( select a c1 / 4 0 as x1, b c1 / 4 0 as x2 from sys dummy10 a, sys dummy10 b ) ); modified 100 rows create mlmodel nlo1 type nonlinear regression on ( select x1, x2, y from mldemo nlo input1 ) options( 'numparameters' >'5', 'function' >'a1 sin(a2 x1 + a3) + a4 + a5 x2', 'metrics' >'true' ); modified 0 rows in this case, specify only one function that covers the whole model use the machine learning models and nonlinear regression models system catalog tables to see the metrics and the values that the model chose select name, on select from sys machine learning models a, sys nonlinear regression models b where a id = b machine learning model id and name = 'nlo1'; name on select coefficients coefficient of determination num arguments rmse adjusted r2 \ nlo1 select x1, x2, y from mldemo nlo input1 \[ 2 085086077121484, 2 999196008218594, 2 14015015138094, 3 9822034572268925, 4 9800866766427205] 0 9985484616734742 2 0 06623956527655686 0 9985483156287689 fetched 1 row if you look at the coefficients, you can see the amplitude (2 0) is negative in the model, but that is offset by subtracting pi from the phase the model still almost exactly matches how the data is generated, but it just found a slightly different, but equivalent, way of doing it non linear regression models (and all of the other models that use the same infrastructure) have lots of options that control how the search works these options affect the length of time it takes for training the model, but they also affect the likelihood of getting stuck in a global minimum the model chooses default values such that they should be sufficient for a wide range of use cases with non linear regression models, you can specify a loss function other than least squares in this example, specify the abs(f y) loss function create mlmodel nlo2 type nonlinear regression on ( select x1, x2, y from mldemo nlo input1 ) options( 'numparameters' >'5', 'function' >'a1 sin(a2 x1 + a3) + a4 + a5 x2', 'lossfunction' >'abs(f y)' ); modified 0 rows in this case, find the best fit that minimizes the sum of the absolute value of the model errors instead of the sum of the square of the model errors the variable f represents the value of the model, and the variable y represents the actual value in the training data the catalog indicates the model is nearly the same as the previous model in other words, changing the loss function here did not substantially change anything but there are cases where other loss functions are appropriate and they could substantially change the model non linear regression with neural networks in all of the non linear cases, you have to make a good guess for the form of the target function however, with neural networks, you can leave it up to the network to try to figure out that form the downside is that aspects of the model are a black box this example uses the same example as non linear regression, but instead of specifying the form of the function, it relies on what the neural net can do with little information the size of the network layers and the number of layers greatly influence how good of a fit a neural net model has this example uses a simple network create mlmodel nn1 type feedforward network on ( select x1, x2, y from mldemo nlo input1 ) options( 'metrics' >'true', 'hiddenlayers' >'2', 'hiddenlayersize' >'2', 'outputs' >'1', 'lossfunction' >'squared error' ); modified 0 rows in the case of neural network models, the metric used is an average loss because rmse and the coefficient of determination are generally only appropriate for least squares models, and neural networks might be using other loss functions you can see the average loss by querying the machine learning models and feedforward network models system catalog tables select name, on select, num arguments, average loss from sys machine learning models a, sys feedforward network models b where a id = b machine learning model id and name = 'nn1'; name on select num arguments average loss \ nn1 select x1, x2, y from mldemo nlo input1 2 0 026744549902417786 fetched 1 row compute the coefficient of determination manually select avg(y) from mldemo nlo input1; (sum(y))/(null if(double(count(y)), (0))) \ 2 8698217335180987 fetched 1 row select 1 0 sum((nn1(x1, x2) y) ^ 2) / sum((y 2 8698217335180987) ^ 2) as r2 from mldemo nlo input1; r2 \ 0 9981701280250409 fetched 1 row this model fits the data very well vector valued regression vector valued regression is the case where the dependent variable has multiple components you can use any type of model to learn each component separately, or you can use a feedforward network to learn all components in one model the example creates 100 rows of sample data create table mldemo vector valued as ( select x1, { { 2 x1 + 3, 3 x1 2 } } as y from ( select c1 as x1 from sys dummy100 ) ); modified 100 rows there is a dependent variable that depends on x1 , which is vector valued each component can fit a linear model, which can then predict vector valued results create mlmodel vv1 type simple linear regression on ( select x1, y \[1,1] from mldemo vector valued ); modified 0 rows create mlmodel vv2 type simple linear regression on ( select x1, y \[1,2] from mldemo vector valued ); modified 0 rows the query verifies that it produces reasonable results select x1, y as actual, { { vv1(x1), vv2(x1) } } as predicted from mldemo vector valued limit 10; x1 actual predicted \ 51 \[\[105 0, 151 0]] \[\[105 0, 151 0]] 52 \[\[107 0, 154 0]] \[\[107 0, 154 0]] 53 \[\[109 0, 157 0]] \[\[109 0, 157 0]] 54 \[\[111 0, 160 0]] \[\[111 0, 160 0]] 55 \[\[113 0, 163 0]] \[\[113 0, 163 0]] 56 \[\[115 0, 166 0]] \[\[115 0, 166 0]] 57 \[\[117 0, 169 0]] \[\[117 0, 169 0]] 58 \[\[119 0, 172 0]] \[\[119 0, 172 0]] 59 \[\[121 0, 175 0]] \[\[121 0, 175 0]] 60 \[\[123 0, 178 0]] \[\[123 0, 178 0]] fetched 10 rows this example shows a single model can perform this with the feedforward network model type create mlmodel vv3 type feedforward network on ( select x1, y from mldemo vector valued ) options( 'hiddenlayers' >'2', 'hiddenlayersize' >'2', 'outputs' >'2', 'lossfunction' >'vector squared error' ); modified 0 rows outputs , which is set to the size of the vector, and the lossfunction option is set to use vector squared error this example shows how well the model can predict the output select x1, y as actual, vv3(x1) as predicted from mldemo vector valued limit 10; x1 actual predicted \ 51 \[\[105 0, 151 0]] \[\[104 99939226614688, 150 99924962001512]] 52 \[\[107 0, 154 0]] \[\[106 99940272332195, 153 99914250568781]] 53 \[\[109 0, 157 0]] \[\[108 999413180497, 156 99903539136054]] 54 \[\[111 0, 160 0]] \[\[110 99942363767205, 159 99892827703323]] 55 \[\[113 0, 163 0]] \[\[112 9994340948471, 162 99882116270598]] 56 \[\[115 0, 166 0]] \[\[114 99944455202213, 165 99871404837862]] 57 \[\[117 0, 169 0]] \[\[116 99945500919716, 168 99860693405134]] 58 \[\[119 0, 172 0]] \[\[118 99946546637223, 171 99849981972403]] 59 \[\[121 0, 175 0]] \[\[120 99947592354728, 174 99839270539675]] 60 \[\[123 0, 178 0]] \[\[122 99948638072233, 177 99828559106948]] fetched 10 rows autoregression autoregression is linear regression where the independent variables are lags of the dependent variable in other words, it is a series where the next value depends linearly on the previous n values these examples do not use a new model type instead, they use multiple linear regression models one of the important things about autoregression is that some variables indicate the order of the data this variable can be a time variable, but it does not need to be this variable is necessary to correctly set up the model, but it is not directly used by the model instead, the variable helps build the correct lags this example builds a model to predict the fibonacci sequence this statement creates a table with the first several fibonacci terms create table mldemo fib as ( select 0 as t, 0 as f union all select 1, 1 union all select 2, 1 union all select 3, 2 union all select 4, 3 union all select 5, 5 union all select 6, 8 union all select 7, 13 union all select 8, 21 union all select 9, 34 ); modified 10 rows this example builds a model where the dependent variable depends on two lags create mlmodel fib type multiple linear regression on ( select lag(f, 1) over( order by t ), lag(f, 2) over( order by t ), f from mldemo fib ); modified 0 rows when you execute the model, which provides the model with two lags, the model can predict the next value of the sequence that is not in the data, 55 select fib(21, 34); ((((1 0000000000001137)) ((21)))+((1 4210854715202004e 14)))+(((1)) ((34))) \ 55 0000000000024 fetched 1 row vector autoregression vector autoregression is just a more complex version of autoregression this model uses multiple time series, where the next value for each time series depends linearly on the lags of all of the time series this statement sets up some data where the next value for two time series depends on the previous two values of both time series create table mldemo var input as ( select 3 1415 as s1, 1 4142 as s2, 0 as t union all select 2 7182 as s1, 1 618 as s2, 1 as t ); insert into mldemo var input select 0 9 s1l1 0 8 s2l1 + 0 7 s1l2 0 6 s2l2 as s1, 0 5 s1l1 0 4 s2l1 + 0 3 s1l2 0 2 s2l2 as s2, 2 as t from ( select s1 as s1l1, s2 as s2l1 from mldemo var input where t = 1 ), ( select s1 as s1l2, s2 as s2l2 from mldemo var input where t = 0 ); insert into mldemo var input select 0 9 s1l1 0 8 s2l1 + 0 7 s1l2 0 6 s2l2 as s1, 0 5 s1l1 0 4 s2l1 + 0 3 s1l2 0 2 s2l2 as s2, 3 as t from ( select s1 as s1l1, s2 as s2l1 from mldemo var input where t = 2 ), ( select s1 as s1l2, s2 as s2l2 from mldemo var input where t = 1 ); insert into mldemo var input select 0 9 s1l1 0 8 s2l1 + 0 7 s1l2 0 6 s2l2 as s1, 0 5 s1l1 0 4 s2l1 + 0 3 s1l2 0 2 s2l2 as s2, 4 as t from ( select s1 as s1l1, s2 as s2l1 from mldemo var input where t = 3 ), ( select s1 as s1l2, s2 as s2l2 from mldemo var input where t = 2 ); insert into mldemo var input select 0 9 s1l1 0 8 s2l1 + 0 7 s1l2 0 6 s2l2 as s1, 0 5 s1l1 0 4 s2l1 + 0 3 s1l2 0 2 s2l2 as s2, 5 as t from ( select s1 as s1l1, s2 as s2l1 from mldemo var input where t = 4 ), ( select s1 as s1l2, s2 as s2l2 from mldemo var input where t = 3 ); insert into mldemo var input select 0 9 s1l1 0 8 s2l1 + 0 7 s1l2 0 6 s2l2 as s1, 0 5 s1l1 0 4 s2l1 + 0 3 s1l2 0 2 s2l2 as s2, 6 as t from ( select s1 as s1l1, s2 as s2l1 from mldemo var input where t = 5 ), ( select s1 as s1l2, s2 as s2l2 from mldemo var input where t = 4 ); insert into mldemo var input select 0 9 s1l1 0 8 s2l1 + 0 7 s1l2 0 6 s2l2 as s1, 0 5 s1l1 0 4 s2l1 + 0 3 s1l2 0 2 s2l2 as s2, 7 as t from ( select s1 as s1l1, s2 as s2l1 from mldemo var input where t = 6 ), ( select s1 as s1l2, s2 as s2l2 from mldemo var input where t = 5 ); insert into mldemo var input select 0 9 s1l1 0 8 s2l1 + 0 7 s1l2 0 6 s2l2 as s1, 0 5 s1l1 0 4 s2l1 + 0 3 s1l2 0 2 s2l2 as s2, 8 as t from ( select s1 as s1l1, s2 as s2l1 from mldemo var input where t = 7 ), ( select s1 as s1l2, s2 as s2l2 from mldemo var input where t = 6 ); the query inspects the data select from mldemo var input order by t; s1 s2 t \ 3 1415 1 4142 0 2 7182 1 618 1 2 50251 1 37151 2 2 086991 1 1945109999999999 3 1 8515340999999994 1 0421421 4 1 5768541099999998 0 8961053099999997 5 1 3730730609999995 0 7770167409999997 6 1 1802870530999996 0 6695650050999997 7 1 0215474418099997 0 5788360946099997 8 fetched 9 rows this example fits the vector autoregression model to the data create mlmodel var1 type vector autoregression on ( select { { s1, s2 } }, { { s1 lag1, s2 lag1 } }, { { s1 lag2, s2 lag2 } } from ( select s1, s2, lag(s1, 1) over( order by t ) as s1 lag1, lag(s1, 2) over( order by t ) as s1 lag2, lag(s2, 1) over( order by t ) as s2 lag1, lag(s2, 2) over( order by t ) as s2 lag2 from mldemo var input ) ) options('numvariables' >'2', 'numlags' >'2'); modified 0 rows see the details of the model in the machine learning models and vector autoregression models system catalog tables select name, coefficients, coefficient of determination from sys machine learning models a, sys vector autoregression models b where a id = b machine learning model id and name = 'var1'; name coefficients coefficient of determination \ var1 \[ 2 2737367544323206e 12, 0 8999999999432475, 0 799999999901047, 0 6999999999643478, 0 599999999973079, 5 684341886080801e 13, 0 4999999999781721, 0 39999999997235136, 0 29999999999017746, 0 1999999999825377] null fetched 1 row the model recovered the expected coefficients the coefficient of determination is null here because the model did not enable metrics collection in the options this example checks that the model matches the data reasonably well select var1({ { s1 lag1, s2 lag1 } }, { { s1 lag2, s2 lag2 } }) as predicted, { { s1, s2 } } as actual from ( select s1, s2, s1 lag1, s2 lag1, s1 lag2, s2 lag2 from ( select s1, s2, lag(s1, 1) over( order by t ) as s1 lag1, lag(s1, 2) over( order by t ) as s1 lag2, lag(s2, 1) over( order by t ) as s2 lag1, lag(s2, 2) over( order by t ) as s2 lag2 from mldemo var input ) ); predicted actual \ null \[\[3 1415, 1 4142]] null \[\[2 7182, 1 618]] \[\[2 5025099999296385, 1 3715099999786722]] \[\[2 50251, 1 37151]] \[\[2 086990999938066, 1 1945109999842818]] \[\[2 086991, 1 1945109999999999]] \[\[1 8515340999451868, 1 0421420999862723]] \[\[1 8515340999999994, 1 0421421]] \[\[1 5768541099535214, 0 8961053099881894]] \[\[1 5768541099999998, 0 8961053099999997]] \[\[1 3730730609589523, 0 7770167409897997]] \[\[1 3730730609999995, 0 7770167409999997]] \[\[1 1802870530645944, 0 6695650050911028]] \[\[1 1802870530999996, 0 6695650050999997]] \[\[1 0215474417789623, 0 5788360946022622]] \[\[1 0215474418099997, 0 5788360946099997]] fetched 9 rows the first two values have no predictions because the lags necessary to drive the predictions do not exist for details, see docid\ pfnkdj1fwsevxoondal9k related links docid\ dsqzzzggetdfwh5qasvhg docid\ pfnkdj1fwsevxoondal9k