SQL Reference
Machine Learning Model Functio...
Data Preparation
lag generator and vector assembler sql functions simplify generating and organizing lagged features for time series data each function expands into standard sql during query processing, so developers can reduce boilerplate and create feature sets with minimal code these functions fall into two main types docid\ vrwcjklivsomyas4ude8b — create individual lag feature columns docid\ vrwcjklivsomyas4ude8b — group generated lag columns into vectors developers can apply lag features across many ocient supported models, including simple linear regression (univariate forecasting with lag inputs) multiple linear regression (multivariate regression with lagged predictors) polynomial regression (regression with higher order lagged terms) linear combination regression (custom functions of lagged inputs) nonlinear regression (arbitrary functions that include lag variables) feedforward neural networks (lagged features as input vectors) vector autoregression (var) (multivariate time series with structured lag vectors) autoregression (univariate time series modeled through lagged values) vector valued regression (multivariate dependent variables with lagged inputs) the functions provide general purpose utilities for time series feature engineering in particular, var models benefit the most from lag functions because the models require large numbers of structured lag vectors lag generator functions lag generator functions provide options for creating lagged columns for one or more variables in a single statement these functions follow the syntax rules of window aggregate functions as they define the window for the lag computation with an over and order by clause for details, see docid\ knh6ndwnm hzuab9dwzrz the window parameters of lag generator functions are represented in these examples by \<window specification> lags generates a series of lagged columns for a single variable in one statement this function simplifies univariate time series feature creation syntax lags(expression, start lag, end lag \[, step]) over (\<window specification>) argument type description expression any the column or expression to lag start lag integer first lag to generate this value must be less than the end lag value end lag integer last lag to generate this value must be greater than the start lag value step integer optional defines the gap between successive lags must be a positive integer if unspecified, this value defaults to 1 example this query produces three lagged columns ( sales lag1 , sales lag2 , sales lag3 ) from the sales column select lags(sales, 1, 3) over (order by day) from store sales; output sales lag1 | sales lag2 | sales lag3 \ + + null | null | null 100 | null | null 120 | 100 | null 130 | 120 | 100 lags zerofill generates lagged columns for a single variable and replaces null values with 0 syntax lags zerofill(expression, start lag, end lag \[, step]) over (\<window specification>) argument type description expression any the column or expression to lag start lag integer first lag to generate this value must be less than the end lag value end lag integer last lag to generate this value must be greater than the start lag value step integer optional defines the gap between successive lags must be a positive integer if unspecified, this value defaults to 1 example this query produces three lagged sales columns and replaces null values with 0 select lags zerofill(sales, 1, 3) over (order by day) from store sales; output sales lag1 | sales lag2 | sales lag3 \ + + 0 | 0 | 0 100 | 0 | 0 120 | 100 | 0 130 | 120 | 100 multi lags generates lagged columns for multiple variables at once this function is intended primarily for creating features for multivariate time series models syntax multi lags(expr1, expr2, \[ , ] start lag, end lag \[, step]) over (\<window specification>) argument type description expr1, expr2, \[ , ] any one or more columns or expressions to lag applies the same lag range and step to each input start lag integer first lag to generate this value must be less than the end lag value end lag integer last lag to generate this value must be greater than the start lag value step integer optional defines the gap between successive lags must be a positive integer if unspecified, this value defaults to 1 example this query creates four lagged columns for both interest rate and gdp growth select multi lags(interest rate, gdp growth, 1, 4) over (order by quarter) from economic data; output interest rate lag1 | interest rate lag2 | interest rate lag3 | interest rate lag4 | gdp growth lag1 | gdp growth lag2 | gdp growth lag3 | gdp growth lag4 \ + + + + + + + null | null | null | null | null | null | null | null 2 5 | null | null | null | 1 1 | null | null | null 2 7 | 2 5 | null | null | 1 2 | 1 1 | null | null 2 8 | 2 7 | 2 5 | null | 1 3 | 1 2 | 1 1 | null multi lags zerofill generates lagged columns for multiple variables and replaces null values with 0 syntax multi lags zerofill(expr1, expr2, \[ , ] start lag, end lag \[, step]) over (\<window specification>) argument type description expr1, expr2, \[ , ] any one or more columns or expressions to lag applies the same lag range and step to each input start lag integer first lag to generate this value must be less than the end lag value end lag integer last lag to generate this value must be greater than the start lag value step integer optional defines the gap between successive lags must be a positive integer if unspecified, this value defaults to 1 example this query generates four lagged columns each for x1 , x2 , and x3 , replacing missing values with 0 select multi lags zerofill(x1, x2, x3, 1, 4) over (order by time col) from public my time series table; output x1 lag1 | x1 lag2 | x1 lag3 | x1 lag4 | x2 lag1 | x2 lag2 | x2 lag3 | x2 lag4 | x3 lag1 | x3 lag2 | x3 lag3 | x3 lag4 \ + + + + + + + + + + + 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 30 | 0 | 0 | 0 11 | 10 | 0 | 0 | 21 | 20 | 0 | 0 | 31 | 30 | 0 | 0 12 | 11 | 10 | 0 | 22 | 21 | 20 | 0 | 32 | 31 | 30 | 0 vector assembler function a vector assembler function provides options for grouping multiple lagged columns into structured vectors in a single statement this function makes preparing input data for time series models easier, especially those that require organized lag structures, such as var lag vectors groups lagged columns generated by the multi lags or multi lags zerofill functions into vector columns models such as var require these structured lag vectors the resulting vectors are named lag vector syntax lag vectors(expr1, expr2, \[ , ] start lag, end lag \[, step]) argument type description expr1, expr2, \[ , ] numeric two or more lagged columns to group applies the same lag range and step to each input start lag integer first lag to assemble this value must be less than the end lag value end lag integer last lag to assemble this value must be greater than the start lag value step integer optional defines the gap between successive lags must be a positive integer if unspecified, this value defaults to 1 examples assemble lagged columns into vectors this query uses multi lags zerofill to generate lagged columns for x1 , x2 , and x3 , then uses lag vectors to group them into lag vector1 through lag vector4 select {x1, x2, x3} as current vector, lag vectors(x1, x2, x3, 1, 4) from ( select x1, x2, x3, multi lags zerofill(x1, x2, x3, 1, 4) over (order by time col) from public my time series table ) s; e; output current vector | lag vector1 | lag vector2 | lag vector3 | lag vector4 \ + + + + {10, 20, 30} | {0, 0, 0} | {0, 0, 0} | {0, 0, 0} | {0, 0, 0} {11, 21, 31} | {10, 20, 30}| {0, 0, 0} | {0, 0, 0} | {0, 0, 0} {12, 22, 32} | {11, 21, 31}| {10, 20, 30}| {0, 0, 0} | {0, 0, 0} {13, 23, 33} | {12, 22, 32}| {11, 21, 31}| {10, 20, 30}| {0, 0, 0} prepare data for a var model this example shows how to prepare training data for a var model with three variables ( x1 , x2 , x3 ) and four lags the example demonstrates how the lag generator functions and vector assembler can simplify feature creation compared to writing out every lag expression manually multi lags zerofill generates all individual lagged columns ( x1 lag1 … x3 lag4 ) in a single call, replacing null values with zero lag vectors automatically groups these lagged columns into vector inputs ( lag vector1 lag vector4 ), which the model requires create mlmodel my var model type vector autoregression on ( select {x1, x2, x3}, lag vectors(x1, x2, x3, 1, 4) from ( select x1, x2, x3, multi lags zerofill(x1, x2, x3, 1, 4) over (order by t) from public my time series table ) ) options ( 'numvariables' > '3', 'numlags' > '4' ); prepare data for a var model with manual vector assembly this version of the univariate regression example uses repeated lag invocations instead of the lags function although it produces the exact same set of lagged features ( sales lag1 , sales lag2 , sales lag3 ), the query requires writing each lag expression manually this makes the sql longer, more repetitive, and harder to maintain compared to the concise single line lags version create mlmodel my var model manual type vector autoregression on ( select {x1, x2, x3}, {x1 lag1, x2 lag1, x3 lag1}, {x1 lag2, x2 lag2, x3 lag2}, {x1 lag3, x2 lag3, x3 lag3}, {x1 lag4, x2 lag4, x3 lag4} from ( select x1, x2, x3, coalesce(lag(x1, 1) over (order by t), 0) as x1 lag1, coalesce(lag(x1, 2) over (order by t), 0) as x1 lag2, coalesce(lag(x1, 3) over (order by t), 0) as x1 lag3, coalesce(lag(x1, 4) over (order by t), 0) as x1 lag4, coalesce(lag(x2, 1) over (order by t), 0) as x2 lag1, coalesce(lag(x2, 2) over (order by t), 0) as x2 lag2, coalesce(lag(x2, 3) over (order by t), 0) as x2 lag3, coalesce(lag(x2, 4) over (order by t), 0) as x2 lag4, coalesce(lag(x3, 1) over (order by t), 0) as x3 lag1, coalesce(lag(x3, 2) over (order by t), 0) as x3 lag2, coalesce(lag(x3, 3) over (order by t), 0) as x3 lag3, coalesce(lag(x3, 4) over (order by t), 0) as x3 lag4 from public my time series table ) ) options ( 'numvariables' > '3', 'numlags' > '4' ); related links docid\ knh6ndwnm hzuab9dwzrz docid\ pstuxvbgmg7vzwjjvyxpd docid\ jyvggzczktiksnbvb4msz