Skip to main content
Lag generator and vector assembler SQL functions simplify generating and organizing lagged features for time series data. Each function expands into standard SQL during query processing, so developers can reduce boilerplate and create feature sets with minimal code. These functions fall into two main types: Developers can apply lag features across many -supported models, including:
  • Simple Linear Regression (univariate forecasting with lag inputs)
  • Multiple Linear Regression (multivariate regression with lagged predictors)
  • Polynomial Regression (regression with higher-order lagged terms)
  • Linear Combination Regression (custom functions of lagged inputs)
  • Nonlinear Regression (arbitrary functions that include lag variables)
  • Feedforward Neural Networks (lagged features as input vectors)
  • Vector Autoregression (VAR) (multivariate time series with structured lag vectors)
  • Autoregression (univariate time series modeled through lagged values)
  • Vector Valued Regression (multivariate dependent variables with lagged inputs)
The functions provide general-purpose utilities for time series feature engineering. In particular, VAR models benefit the most from lag functions because the models require large numbers of structured lag vectors.

Lag Generator Functions

Lag generator functions provide options for creating lagged columns for one or more variables in a single statement. These functions follow the syntax rules of window aggregate functions as they define the window for the lag computation with an OVER and ORDER BY clause. For details, see Window Aggregate Functions. The window parameters of lag generator functions are represented in these examples by <window_specification>.

LAGS

Generates a series of lagged columns for a single variable in one statement. This function simplifies univariate time-series feature creation. Syntax
SQL
LAGS(expression, start_lag, end_lag [, step])
    OVER (<window_specification>)
ArgumentTypeDescription
expressionANYThe column or expression to lag.
start_lagINTEGERFirst lag to generate. This value must be less than the end_lag value.
end_lagINTEGERLast lag to generate. This value must be greater than the start_lag value.
stepINTEGEROptional.
Defines the gap between successive lags. Must be a positive integer.
If unspecified, this value defaults to 1.
Example This query produces three lagged columns (sales_lag1, sales_lag2, sales_lag3) from the sales column.
SQL
SELECT
    LAGS(sales, 1, 3) OVER (ORDER BY day)
FROM store_sales;
Output
SQL
sales_lag1 | sales_lag2 | sales_lag3
-----------+------------+------------
NULL       | NULL       | NULL
100        | NULL       | NULL
120        | 100        | NULL
130        | 120        | 100

LAGS_ZEROFILL

Generates lagged columns for a single variable and replaces NULL values with 0. Syntax
SQL
LAGS_ZEROFILL(expression, start_lag, end_lag [, step])
    OVER (<window_specification>)
ArgumentTypeDescription
expressionANYThe column or expression to lag.
start_lagINTEGERFirst lag to generate. This value must be less than the end_lag value.
end_lagINTEGERLast lag to generate. This value must be greater than the start_lag value.
stepINTEGEROptional.
Defines the gap between successive lags. Must be a positive integer.
If unspecified, this value defaults to 1.
Example This query produces three lagged sales columns and replaces NULL values with 0.
SQL
SELECT
    LAGS_ZEROFILL(sales, 1, 3) OVER (ORDER BY day)
FROM store_sales;
Output
SQL
sales_lag1 | sales_lag2 | sales_lag3
-----------+------------+------------
0          | 0          | 0
100        | 0          | 0
120        | 100        | 0
130        | 120        | 100

MULTI_LAGS

Generates lagged columns for multiple variables at once. This function is intended primarily for creating features for multivariate time-series models. Syntax
SQL
MULTI_LAGS(expr1, expr2, [ ,... ] start_lag, end_lag [, step])
    OVER (<window_specification>)
ArgumentTypeDescription
expr1, expr2, [ ,... ] ANYOne or more columns or expressions to lag. Applies the same lag range and step to each input.
start_lagINTEGERFirst lag to generate. This value must be less than the end_lag value.
end_lagINTEGERLast lag to generate. This value must be greater than the start_lag value.
stepINTEGEROptional.
Defines the gap between successive lags. Must be a positive integer.
If unspecified, this value defaults to 1.
Example This query creates four lagged columns for both interest_rate and gdp_growth.
SQL
SELECT
    MULTI_LAGS(interest_rate, gdp_growth, 1, 4) OVER (ORDER BY quarter)
FROM economic_data;
Output
SQL
interest_rate_lag1 | interest_rate_lag2 | interest_rate_lag3 | interest_rate_lag4 | gdp_growth_lag1 | gdp_growth_lag2 | gdp_growth_lag3 | gdp_growth_lag4
-------------------+--------------------+--------------------+--------------------+-----------------+-----------------+-----------------+-----------------
NULL               | NULL               | NULL               | NULL               | NULL            | NULL            | NULL            | NULL
2.5                | NULL               | NULL               | NULL               | 1.1             | NULL            | NULL            | NULL
2.7                | 2.5                | NULL               | NULL               | 1.2             | 1.1             | NULL            | NULL
2.8                | 2.7                | 2.5                | NULL               | 1.3             | 1.2             | 1.1             | NULL

MULTI_LAGS_ZEROFILL

Generates lagged columns for multiple variables and replaces NULL values with 0. Syntax
SQL
MULTI_LAGS_ZEROFILL(expr1, expr2, [ ,... ] start_lag, end_lag [, step])
    OVER (<window_specification>)
ArgumentTypeDescription
expr1, expr2, [ ,... ] ANYOne or more columns or expressions to lag. Applies the same lag range and step to each input.
start_lagINTEGERFirst lag to generate. This value must be less than the end_lag value.
end_lagINTEGERLast lag to generate. This value must be greater than the start_lag value.
stepINTEGEROptional.
Defines the gap between successive lags. Must be a positive integer.
If unspecified, this value defaults to 1.
Example This query generates four lagged columns each for x1, x2, and x3, replacing missing values with 0.
SQL
SELECT
    MULTI_LAGS_ZEROFILL(x1, x2, x3, 1, 4) OVER (ORDER BY time_col)
FROM public.my_time_series_table;
Output
SQL
x1_lag1 | x1_lag2 | x1_lag3 | x1_lag4 | x2_lag1 | x2_lag2 | x2_lag3 | x2_lag4 | x3_lag1 | x3_lag2 | x3_lag3 | x3_lag4
--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------
0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0       | 0
10      | 0       | 0       | 0       | 20      | 0       | 0       | 0       | 30      | 0       | 0       | 0
11      | 10      | 0       | 0       | 21      | 20      | 0       | 0       | 31      | 30      | 0       | 0
12      | 11      | 10      | 0       | 22      | 21      | 20      | 0       | 32      | 31      | 30      | 0

Vector Assembler Function

A vector assembler function provides options for grouping multiple lagged columns into structured vectors in a single statement. This function makes preparing input data for time series models easier, especially those that require organized lag structures, such as VAR.

LAG_VECTORS

Groups lagged columns generated by the MULTI_LAGS or MULTI_LAGS_ZEROFILL functions into vector columns. Models such as VAR require these structured lag vectors. The resulting vectors are named lag_vector_. Syntax
SQL
LAG_VECTORS(expr1, expr2, [ ..., ] start_lag, end_lag [, step])
ArgumentTypeDescription
expr1, expr2, [ ,... ] NUMERICTwo or more lagged columns to group. Applies the same lag range and step to each input.
start_lagINTEGERFirst lag to assemble. This value must be less than the end_lag value.
end_lagINTEGERLast lag to assemble. This value must be greater than the start_lag value.
stepINTEGEROptional.
Defines the gap between successive lags. Must be a positive integer.
If unspecified, this value defaults to 1.
Examples Assemble Lagged Columns into Vectors This query uses MULTI_LAGS_ZEROFILL to generate lagged columns for x1, x2, and x3, then uses LAG_VECTORS to group them into lag_vector1 through lag_vector4.
SQL
SELECT
    {x1, x2, x3} AS current_vector,
    LAG_VECTORS(x1, x2, x3, 1, 4)
FROM (
    SELECT
        x1, x2, x3,
        MULTI_LAGS_ZEROFILL(x1, x2, x3, 1, 4) OVER (ORDER BY time_col)
    FROM public.my_time_series_table
) s;
e;
Output
SQL
current_vector | lag_vector1 | lag_vector2 | lag_vector3 | lag_vector4
---------------+-------------+-------------+-------------+-------------
{10, 20, 30}  | {0, 0, 0}   | {0, 0, 0}   | {0, 0, 0}   | {0, 0, 0}
{11, 21, 31}  | {10, 20, 30}| {0, 0, 0}   | {0, 0, 0}   | {0, 0, 0}
{12, 22, 32}  | {11, 21, 31}| {10, 20, 30}| {0, 0, 0}   | {0, 0, 0}
{13, 23, 33}  | {12, 22, 32}| {11, 21, 31}| {10, 20, 30}| {0, 0, 0}

Prepare Data for a VAR Model This example shows how to prepare training data for a VAR model with three variables (x1, x2, x3) and four lags. The example demonstrates how the lag generator functions and vector assembler can simplify feature creation compared to writing out every LAG expression manually.
  • MULTI_LAGS_ZEROFILL generates all individual lagged columns (x1_lag1x3_lag4) in a single call, replacing NULL values with zero.
  • LAG_VECTORS automatically groups these lagged columns into vector inputs (lag_vector1lag_vector4), which the model requires.
SQL
CREATE MLMODEL my_var_model
TYPE VECTOR_AUTOREGRESSION ON (
    SELECT
        {x1, x2, x3},
        LAG_VECTORS(x1, x2, x3, 1, 4)
    FROM (
        SELECT
            x1, x2, x3,
            MULTI_LAGS_ZEROFILL(x1, x2, x3, 1, 4) OVER (ORDER BY t)
        FROM public.my_time_series_table
    )
)
OPTIONS (
    'numVariables' -> '3',
    'numLags'      -> '4'
);
Prepare Data for a VAR Model with Manual Vector Assembly This version of the univariate regression example uses repeated LAG invocations instead of the LAGS function. Although it produces the exact same set of lagged features (sales_lag1, sales_lag2, sales_lag3), the query requires writing each LAG expression manually. This makes the SQL longer, more repetitive, and harder to maintain compared to the concise single-line LAGS version.
SQL
CREATE MLMODEL my_var_model_manual
TYPE VECTOR_AUTOREGRESSION ON (
    SELECT
        {x1, x2, x3},
        {x1_lag1, x2_lag1, x3_lag1},
        {x1_lag2, x2_lag2, x3_lag2},
        {x1_lag3, x2_lag3, x3_lag3},
        {x1_lag4, x2_lag4, x3_lag4}
    FROM (
        SELECT
            x1, x2, x3,
            COALESCE(LAG(x1, 1) OVER (ORDER BY t), 0) AS x1_lag1,
            COALESCE(LAG(x1, 2) OVER (ORDER BY t), 0) AS x1_lag2,
            COALESCE(LAG(x1, 3) OVER (ORDER BY t), 0) AS x1_lag3,
            COALESCE(LAG(x1, 4) OVER (ORDER BY t), 0) AS x1_lag4,
            COALESCE(LAG(x2, 1) OVER (ORDER BY t), 0) AS x2_lag1,
            COALESCE(LAG(x2, 2) OVER (ORDER BY t), 0) AS x2_lag2,
            COALESCE(LAG(x2, 3) OVER (ORDER BY t), 0) AS x2_lag3,
            COALESCE(LAG(x2, 4) OVER (ORDER BY t), 0) AS x2_lag4,
            COALESCE(LAG(x3, 1) OVER (ORDER BY t), 0) AS x3_lag1,
            COALESCE(LAG(x3, 2) OVER (ORDER BY t), 0) AS x3_lag2,
            COALESCE(LAG(x3, 3) OVER (ORDER BY t), 0) AS x3_lag3,
            COALESCE(LAG(x3, 4) OVER (ORDER BY t), 0) AS x3_lag4
        FROM public.my_time_series_table
    )
)
OPTIONS (
    'numVariables' -> '3',
    'numLags'      -> '4'
);
Window Aggregate Functions Machine Learning in Ocient Machine Learning Models
Last modified on May 27, 2026