> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ocient.com/llms.txt
> Use this file to discover all available pages before exploring further.

# OCGraph Python Library

export const Spark = "Apache® Spark™";

export const Python = "Python®";

export const Ocient = "Ocient®";

export const Java = "Java®";

The `ocient_graph` module brings a programming model (similar to {Spark} GraphX) to the {Ocient} System directly from {Python} using the `pyocient` driver. The module treats a graph as two relational tables, one for vertices (nodes) and one for edges (directed links). This module provides a composable API for graph transformations, neighborhood analytics, and iterative algorithms (e.g., Pregel, PageRank).

The API validates inputs, avoids destructive changes by materializing results into new tables, supports optional indexing for performance, and follows Ocient SQL conventions. The package installs separately from `pyocient` and exposes a Python-native interface that mirrors the {Java} library. For details, see [OCGraph Java Library](/ocgraph-java-library).

## Installation

Use `pyocient` for connectivity and `ocient_graph` for graph APIs. The graph library is a separate package that depends on `pyocient`. For a tutorial about installing and using `pyocient`, see [Ocient Python Module: pyocient](/ocient-python-module-pyocient).

**Install and Import**

Install the `ocient_graph` module.

```shell Shell theme={null}
pip install ocient_graph
```

Import the module.

```python Python theme={null}
from pyocient import connect
from ocient_graph import (
    subgraph,
    collect_neighbors,
    EdgeDirection,
)
```

### Data Model Requirements

Database tables that use the OCGraph Python library must adhere to this structure. In addition to the listed requirements, tables can include other columns.

| **Table**      | **Description**                                                                                                                                                                 | **Requirements**                                                                                             |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| Vertices table | A table with one row per vertex (node). This table typically represents the anchor for graph algorithms and transforms. Many methods join edges to vertices by the `id` column. | The table must contain the<br />`id BIGINT NOT NULL` column definition as the unique vertex identifier.      |
| Edges table    | A table with one row per directed edge (relationship). Each row is a directed edge from a source vertex to a destination vertex.                                                | The table must contain these column definitions:<br />`srcid BIGINT NOT NULL`,<br />`destid BIGINT NOT NULL` |

## Subgraph and Filtering

Use a subgraph or various filters to restrict a graph to relevant vertices and edges. These functions create filtered copies or masked intersections, preserving schema and optional indexes for performance.

### subgraph

Creates filtered vertex and edge tables using vertex and triplet predicates, retaining only edges with endpoints that remain after vertex filtering. The function creates the requested indexes and performs best-effort cleanup in the event of failure.

**Syntax**

```python Python theme={null}
subgraph(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_vertices_table,
    result_edges_table,
    vertex_filter,
    edge_filter,
    [ result_vertices_indexes [ , ... ] ],
    [ result_edges_indexes [ , ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                                                                                                                                                                         |
| ------------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                                                              |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                                                                                                                                                                                         |
| `input_vertices_table`    | str                 | Input vertices table (must have an `id` column).                                                                                                                                                                                                        |
| `input_edges_table`       | str                 | Input edges table (must have `srcid` and `destid` columns).                                                                                                                                                                                             |
| `result_schema`           | str                 | A writable schema to create the result tables.                                                                                                                                                                                                          |
| `result_vertices_table`   | str                 | Name of the filtered vertices table to create.                                                                                                                                                                                                          |
| `result_edges_table`      | str                 | Name of the filtered edges table to create.                                                                                                                                                                                                             |
| `vertex_filter`           | str                 | SQL predicate to filter the vertices (without the `WHERE` keyword). Example: `status = 'ACTIVE' AND score > 0`                                                                                                                                          |
| `edge_filter`             | str                 | SQL predicate that the system evaluates in a triplet context using the aliases `a` (source vertex), `b` (edge), and `c` (destination vertex). This predicate does not require a `WHERE` keyword. <br />Example: `b.amount > 50 AND a.region = c.region` |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.                                                                                                                                                   |
| `result_edges_indexes`    | List\[str]          | Optional. Columns to index in the result edges table (e.g., `srcid` and `destid` columns). Specify an empty list for none.                                                                                                                              |

**Example**
Create an active customer subgraph that includes only purchases exceeding \$50 where the source and destination share a region.

```python Python theme={null}
subgraph(
    connection,
    "sales",
    "customers",
    "purchases",
    "sales",
    "customers_active",
    "purchases_active",
    "status = 'ACTIVE' AND score > 0",
    "b.amount > 50 AND a.region = c.region",
    ["id","region"],
    ["srcid","destid"],
)
```

### filter\_vertices

Creates a filtered subgraph by selecting vertices that match a predicate while retaining only edges with endpoints that are in the filtered vertex set.

**Syntax**

```python Python theme={null}
filter_vertices(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_vertices_table,
    result_edges_table,
    vertex_filter,
    [ result_vertices_indexes [ , ... ] ],
    [ result_edges_indexes [ , ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                                            |
| ------------------------- | ------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                                                 |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                                                            |
| `input_vertices_table`    | str                 | Input vertices table (must have an `id` column).                                                                           |
| `input_edges_table`       | str                 | Input edges table (must have `srcid` and `destid` columns).                                                                |
| `result_schema`           | str                 | A writable schema to create the result tables.                                                                             |
| `result_vertices_table`   | str                 | Name of the result vertices table. The name must not conflict with the names of input tables.                              |
| `result_edges_table`      | str                 | Name of the edges table to create. The name must not conflict with the names of input tables.                              |
| `vertex_filter`           | str                 | SQL predicate to filter the vertices (without the `WHERE` keyword). Example: `status = 'ACTIVE' AND score > 0`             |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.                      |
| `result_edges_indexes`    | List\[str]          | Optional. Columns to index in the result edges table (e.g., `srcid` and `destid` columns). Specify an empty list for none. |

**Example**
Filter US customers and retain edges with endpoints that remain in the filtered vertex set.

```python Python theme={null}
filter_vertices(
    connection,
    "sales", "customers", "purchases",
    "sales", "customers_us", "purchases_us",
    "country = 'US'",
    ["id"],
    ["srcid","destid"],
)
```

### filter\_edges

Creates a filtered edges table by selecting edges that match a predicate.

**Syntax**

```python Python theme={null}
filter_edges(
    connection,
    input_schema,
    input_edges_table,
    result_schema,
    result_edges_table,
    edge_filter,
    [ result_edges_indexes [, ... ] ]
)
```

| **Argument**           | **Data Type**       | **Description**                                                                                                            |
| ---------------------- | ------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| `connection`           | pyocient.Connection | An active database connection using the `pyocient` module.                                                                 |
| `input_schema`         | str                 | A non-empty achema containing the input edges table.                                                                       |
| `input_edges_table`    | str                 | Input edges table (must have `srcid` and `destid` columns).                                                                |
| `result_schema`        | str                 | A writable schema to create the result tables.                                                                             |
| `result_edges_table`   | str                 | Result edges table name. The name must not conflict with the names of input tables.                                        |
| `edge_filter`          | str                 | SQL predicate on edges (without the `WHERE` keyword). Example: `weight > 0.5 AND type = 'ACTIVE'`                          |
| `result_edges_indexes` | List\[str]          | Optional. Columns to index in the result edges table (e.g., `srcid` and `destid` columns). Specify an empty list for none. |

**Example**
This example demonstrates how to create a filtered edges table from an existing `purchases` edge set by keeping only edges that meet a business rule (`weight > 0.5` and `ACTIVE`). Then, the `filterEdges` method indexes the result on the `srcid` and `destid` columns for faster lookups.

```python Python theme={null}
filter_edges(
    connection,
    "sales", "purchases",
    "sales", "purchases_filtered",
    "weight > 0.5 AND type = 'ACTIVE'",
    ["srcid", "destid"],
)
```

### mask

Creates a masked subgraph by intersecting two graphs. Vertices intersect if the vertex identifier is present in both graphs. Edges intersect when the `srcid` and `destid` values are present in both graphs.

The function creates a masked subgraph from rows that intersect with each other. The function copies rows that intersect from the graph defined by the arguments `input_vertices_table` and `input_edges_table`, including any attributes.

You can optionally create indexes on the result subgraph tables.

**Syntax**

```java Java theme={null}
mask(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    other_schema,
    other_vertices_table,
    other_edges_table,
    result_schema,
    result_vertices_table,
    result_edges_table,
    [ result_vertices_indexes [ , ... ] ],
    [ result_edges_indexes [ , ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                                                                                                    |
| ------------------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                         |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                                                                                                                    |
| `input_vertices_table`    | str                 | The primary vertices table to intersect (must have an `id` column).<br /><br />The masked subgraph created by this function copies rows that intersect from this table.            |
| `input_edges_table`       | str                 | The primary edges table to intersect (must have `srcid` and `destid` columns).<br /><br />The masked subgraph created by this function copies rows that intersect from this table. |
| `other_schema`            | str                 | Schema containing the second graph.                                                                                                                                                |
| `other_vertices_table`    | str                 | The second vertices table to intersect.                                                                                                                                            |
| `other_edges_table`       | str                 | The second edges table to intersect.                                                                                                                                               |
| `result_schema`           | str                 | A writable schema to create the result tables.                                                                                                                                     |
| `result_vertices_table`   | str                 | Name of the vertices table to create.                                                                                                                                              |
| `result_edges_table`      | str                 | Name of the edges table to create.                                                                                                                                                 |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.                                                                              |
| `result_edges_indexes`    | List\[str]          | Optional. Columns to index in the result edges table (e.g., `srcid` and `destid` columns). Specify an empty list for none.                                                         |

**Example**

Create a masked subgraph by intersecting two graphs. The example copies vertices and edges that are present in both graphs, along with the remaining endpoints.

```java Java theme={null}
mask(
    connection,
    "sales", "customers", "purchases",
    "ref",   "customers_ref", "purchases_ref",
    "sales", "customers_masked", "purchases_masked",
    ["id"],
    ["srcid", "destid"],
)
```

## Transformations

Construct new vertex or edge tables by computing derived columns, reversing direction, or aggregating duplicates. These functions do not change the original inputs. Instead, the functions materialize new results.

### map\_vertices

Creates a new vertices table with the identifier `id` and computed columns. Use the `result_column_expressions` argument to calculate additional columns. This function can also add indexes before inserting data.

**Syntax**

```python Python theme={null}
map_vertices(
    connection,
    input_schema,
    input_vertices_table,
    result_schema,
    result_vertices_table,
    result_column_expressions,
    [ result_vertices_indexes [, ... ] ]
)
```

| **Argument**                | **Data Type**       | **Description**                                                                                                    |
| --------------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------ |
| `connection`                | pyocient.Connection | An active database connection using the `pyocient` module.                                                         |
| `input_schema`              | str                 | A non-empty schema containing the input vertices table.                                                            |
| `input_vertices_table`      | str                 | Input vertices table (must have an `id` column).                                                                   |
| `result_schema`             | str                 | A schema to create the result vertices table.                                                                      |
| `result_vertices_table`     | str                 | Name of the result vertices table. The name must not conflict with the names of input tables.                      |
| `result_column_expressions` | List\[str]          | One or more SQL expressions defining result columns beyond `id`. Use the `AS alias_name` keyword for stable names. |
| `result_vertices_indexes`   | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.              |

**Example**
Create a new vertices table with two new columns, `name_upper` and `is_vip`, and generate indexes for the `id` and `name_upper` columns.

```python Python theme={null}
map_vertices(
    connection,
    "sales", "customers",
    "sales", "customers_enriched",
    [
      "UPPER(name) AS name_upper",
      "CASE WHEN score > 1000 THEN true ELSE false END AS is_vip",
    ],
    ["id","name_upper"],
)
```

### map\_edges

Creates a new edges table with `srcid`, `destid`, and any additional computed columns. Expressions should refer to input edge columns by their original names, and each computed expression should include an `AS alias` keyword.

**Syntax**

```python Python theme={null}
map_edges(
    connection,
    input_schema,
    input_edges_table,
    result_schema,
    result_edges_table,
    [ result_column_expressions [, ... ] ],
    [ result_edges_indexes [, ... ] ]
)
```

| **Argument**                | **Data Type**       | **Description**                                                                                                                                     |
| --------------------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`                | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                          |
| `input_schema`              | str                 | A non-empty schema containing the input edges table.                                                                                                |
| `input_edges_table`         | str                 | Input edges table (must have `srcid` and `destid` columns).                                                                                         |
| `result_schema`             | str                 | A schema to create the result edges table.                                                                                                          |
| `result_edges_table`        | str                 | Name of the result edges table. The name must not conflict with the names of input tables.                                                          |
| `result_column_expressions` | List\[str]          | Optional. SQL expressions for additional edge columns (besides the `srcid` and `destid` columns). Use the `AS alias_name` keyword for stable names. |
| `result_edges_indexes`      | List\[str]          | Optional. Columns to index in the result edges table (e.g., `srcid` and `destid` columns). Specify an empty list for none.                          |

**Example**

Create a new edges table with two columns, `discounted_amount` and `big_txn`, and generate indexes for the `srcid` and `destid` columns.

```python Python theme={null}
map_edges(
    connection,
    "sales", "purchases",
    "sales", "purchases_enriched",
    [
      "amount * 0.9 AS discounted_amount",
      "CASE WHEN amount > 100 THEN 1 ELSE 0 END AS big_txn",
    ],
    ["srcid","destid"],
)
```

### map\_triplets

Creates a new edges table with computed columns that reference `a` (source vertex), `b` (edge), and `c` (destination vertex). The output automatically includes `b.srcid` and `b.destid` columns.

**Syntax**

```python Python theme={null}
map_triplets(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_edges_table,
    [ result_column_expressions [, ... ] ],
    [ result_edges_indexes [, ... ] ]
)
```

| **Argument**                | **Data Type**       | **Description**                                                                                                                                               |
| --------------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`                | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                    |
| `input_schema`              | str                 | A non-empty schema containing the input tables.                                                                                                               |
| `input_vertices_table`      | str                 | Input vertices table (must have an `id` column). Used as the `a` and `c` vertices.                                                                            |
| `input_edges_table`         | str                 | Input edges table (must have the `srcid` and `destid` columns). This function uses this argument as the `b` edge.                                             |
| `result_schema`             | str                 | A writable schema to create the result edges table.                                                                                                           |
| `result_edges_table`        | str                 | Name of the result edges table. The name must not conflict with the names of input tables.                                                                    |
| `result_column_expressions` | List\[str]          | Optional. SQL expressions that can reference `a` (source vertex), `b` (edge), and `c` (destination vertex). Use the `AS alias_name` keyword for stable names. |
| `result_edges_indexes`      | List\[str]          | Optional. Columns to index in the result edges table (e.g., `srcid` and `destid` columns). Specify an empty list for none.                                    |

**Example**
Create a new triplet table from the vertices and edges tables with the `amount` and `same_country` columns, and generate indexes for the `src` and `destid` columns.

```python Python theme={null}
map_triplets(
    connection,
    "sales", "customers", "purchases",
    "sales", "purchases_triplets",
    [
      "b.amount AS amount",
      "CASE WHEN a.country = c.country THEN 1 ELSE 0 END AS same_country",
    ],
    ["srcid","destid"],
)
```

### reverse\_edges

Creates a new edges table with the `srcid` and `destid` columns reversed, preserving other columns. Use this function to traverse a graph in the opposite direction.

**Syntax**

```python Python theme={null}
reverse_edges(
    connection,
    input_schema,
    input_edges_table,
    result_schema,
    result_edges_table
    [ result_edges_indexes [, ... ] ]
)
```

| **Argument**           | **Data Type**       | **Description**                                                                                                            |
| ---------------------- | ------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| `connection`           | pyocient.Connection | An active database connection using the `pyocient` module.                                                                 |
| `input_schema`         | str                 | A non-empty schema containing the input edges table.                                                                       |
| `input_edges_table`    | str                 | Input edges table (must have the `srcid` and `destid` columns).                                                            |
| `result_schema`        | str                 | A writable schema to create the reversed edges table.                                                                      |
| `result_edges_table`   | str                 | Name of the reversed edges table to create. The name must not conflict with the names of input tables.                     |
| `result_edges_indexes` | List\[str]          | Optional. Columns to index in the result edges table (e.g., `srcid` and `destid` columns). Specify an empty list for none. |

**Example**
Transform edge direction by reversing the `srcid` and `destid` columns. The example also creates indexes for these columns.

```python Python theme={null}
reverse_edges(
    connection,
    "sales", "purchases",
    "sales", "purchases_reversed",
    ["srcid","destid"],
)
```

### group\_edges

Groups duplicate rows of the `srcid` and `destid` columns, producing one row for each unique pair of values in a new edges table. This function performs aggregations based on one or more SQL expressions.

**Syntax**

```python Python theme={null}
group_edges(
    connection,
    input_schema,
    input_edges_table,
    result_schema,
    result_edges_table,
    [ result_column_expressions [, ... ] ],
    [ result_edges_indexes [, ... ] ]
)
```

| **Argument**                | **Data Type**       | **Description**                                                                                                                                                                                                            |
| --------------------------- | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`                | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                                 |
| `input_schema`              | str                 | A non-empty schema containing the input edges table.                                                                                                                                                                       |
| `input_edges_table`         | str                 | Input edges table (must have the `srcid` and `destid` columns).                                                                                                                                                            |
| `result_schema`             | str                 | A writable schema to create the grouped tables.                                                                                                                                                                            |
| `result_edges_table`        | str                 | Name of the grouped edges table. The name must not conflict with the names of input tables.                                                                                                                                |
| `result_column_expressions` | List\[str]          | One or more aggregate expressions for the `srcid` and `destid` columns. For details about SQL aggregations, see [Aggregate Functions](/aggregate-functions). <br /><br />Use the `AS alias_name` keyword for stable names. |
| `result_edges_indexes`      | List\[str]          | Optional. Columns to index in the result edges table (e.g., `srcid` and `destid` columns). Specify an empty list for none.                                                                                                 |

**Example**
Create a new edge table that includes SQL aggregations for counting unique transactions `txn_count` and total sums `total_amount`. Also, this function generates indexes for the `src` and `destid` columns.

```python Python theme={null}
group_edges(
    connection,
    "sales", "purchases",
    "sales", "purchases_grouped",
    [
      "COUNT(*) AS txn_count",
      "SUM(amount) AS total_amount",
    ],
    ["srcid","destid"],
)
```

## Triplets

Produce triplet representations that are made of `a` (source vertex), `b` (edge), and `c` (destination vertex), either as a logical view or a materialized table for downstream queries.

### create\_triplets\_view

Creates a view that combines the edge table with the source and destination vertex attributes. This view is useful for analyzing relationships without having to repeatedly join tables. The view includes these columns:

* All original edge columns (including the `srcid` and `destid` columns).
* All source-vertex columns except `id`. Source-vertex column names have the `src_` prefix.
* All destination-vertex columns except `id`. Destination-vertex column names have the `dest_` prefix.

Use the [create\_triplets\_table](#create_triplets_table) function instead if you want to create a materialized table with indexes instead of a view.

**Syntax**

```python Python theme={null}
create_triplets_view(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_triplets_view
)
```

| **Argument**           | **Data Type**       | **Description**                                                                                 |
| ---------------------- | ------------------- | ----------------------------------------------------------------------------------------------- |
| `connection`           | pyocient.Connection | An active database connection using the `pyocient` module.                                      |
| `input_schema`         | str                 | A non-empty schema containing the input vertices and edges tables.                              |
| `input_vertices_table` | str                 | Input vertices table (must have an `id` column).                                                |
| `input_edges_table`    | str                 | Input edges table (must have the `srcid` and `destid` columns).                                 |
| `result_schema`        | str                 | A schema to create the view.                                                                    |
| `result_triplets_view` | str                 | Name of the triplets view to create. The name must not conflict with the names of input tables. |

**Example**
Create a triplets view to inspect edges with joined source and destination vertex attributes.

```python Python theme={null}
create_triplets_view(
    connection,
    "sales", "customers", "purchases",
    "sales", "triplets_v",
)
```

### create\_triplets\_table

Creates a materialized table that combines the edge table with the source and destination vertex attributes. This table is useful for analyzing relationships without having to repeatedly join tables. The created table includes these columns:

* All original edge columns (including the `srcid` and `destid` columns).
* All source-vertex columns except `id`. Source-vertex column names have the `src_` prefix.
* All destination-vertex columns except `id`. Destination-vertex column names have the `dest_` prefix.

Use the [create\_triplets\_view](#create_triplets_view) function if you want to create a view instead of a new table.

**Syntax**

```python Python theme={null}
create_triplets_table(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_triplets_table,
    [ result_triplets_indexes [, ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                        |
| ------------------------- | ------------------- | -------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                             |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                        |
| `input_vertices_table`    | str                 | Input vertices table (must have an `id` column).                                       |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).                        |
| `result_schema`           | str                 | A schema to create the triplets table.                                                 |
| `result_triplets_table`   | str                 | Name of the triplets table to create. The name must not conflict with any input names. |
| `result_triplets_indexes` | List\[str]          | Optional. Columns to index in the result table. Specify an empty list for none.        |

**Example**
Create a new table for triplets. Generate indexes for the `src_id` and `dest_id` columns.

```python Python theme={null}
create_triplets_table(
    connection,
    "sales", "customers", "purchases",
    "sales", "triplets_t",
    ["srcid","destid"],
)
```

## Degrees

Compute degree metrics for each vertex from the edges table. These functions produce small vertex tables suitable for joins and analytics.

### in\_degrees

Computes how many edges point to each vertex in an edge table by counting how many times each unique `destid` value appears. The result table has two columns: `id` (the destination vertex) and `in_degree` (the count).

**Syntax**

```python Python theme={null}
in_degrees(
    connection,
    input_schema,
    input_edges_table,
    result_schema,
    result_vertices_table,
    [ result_vertices_indexes [, ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                       |
| ------------------------- | ------------------- | ----------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                            |
| `input_schema`            | str                 | A non-empty schema containing the input table.                                                        |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).                                       |
| `result_schema`           | str                 | A schema to create the in-degree table.                                                               |
| `result_vertices_table`   | str                 | Name of the vertices table to create (columns include `id` and  `in_degree`).                         |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none. |

**Example**
Compute in-degrees per vertex and generate an index on the `id` column.

```python Python theme={null}
in_degrees(
    connection,
    "sales", "purchases",
    "sales", "customers_in_degree",
    ["id"],
)
```

### out\_degrees

Computes how many edges originate from each vertex in an edge table by counting how many times each unique `srcid` value appears. The result table has two columns: `id` (the source vertex) and `out_degree` (the count).

**Syntax**

```python Python theme={null}
out_degrees(
    connection,
    input_schema,
    input_edges_table,
    result_schema,
    result_vertices_table,
    [ result_vertices_indexes [, ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                       |
| ------------------------- | ------------------- | ----------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                            |
| `input_schema`            | str                 | A non-empty schema containing the input table.                                                        |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).                                       |
| `result_schema`           | str                 | A schema to create the out-degree table.                                                              |
| `result_vertices_table`   | str                 | Name of the vertices table to create (columns include `id` and  `out_degree`).                        |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none. |

**Example**
Compute the out-degrees count for each vertex and generate an index on the `id` column.

```python Python theme={null}
out_degrees(
    connection,
    "sales", "purchases",
    "sales", "customers_out_degree",
    ["id"],
)
```

### degrees

Computes the total degrees (in-degrees and out-degrees) for each vertex in an edge table by counting how many times each unique `srcid` and `destid` value appears. The result table has two columns: `id` (the destination or source vertex) and `degree` (the count).

**Syntax**

```python Python theme={null}
degrees(
    connection,
    input_schema,
    input_edges_table,
    result_schema,
    result_vertices_table,
    [ result_vertices_indexes [, ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                       |
| ------------------------- | ------------------- | ----------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                            |
| `input_schema`            | str                 | A non-empty schema containing the input table.                                                        |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).                                       |
| `result_schema`           | str                 | A schema to create the degree table.                                                                  |
| `result_vertices_table`   | str                 | Name of the vertices table to create (columns include `id` and  `out_degree`).                        |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none. |

**Example**
Compute total degrees for each vertex and generate an index on the `id` column.

```python Python theme={null}
degrees(
    connection,
    "sales", "purchases",
    "sales", "customers_degree",
    ["id"],
)
```

## Vertex Extraction and Joins

Build vertex sets from edges and combine vertex attributes across tables. These functions are useful for shaping vertex properties and consolidating features.

### from\_edges

Builds a vertices table from an edges table by extracting the unique source and destination identifiers.

This function can optionally compute additional columns using SQL expressions by referencing the unique identifier as `ids.id`.

The created table always contains the `id` column with one additional column per expression.

**Syntax**

```python Python theme={null}
from_edges(
    connection,
    input_schema,
    input_edges_table,
    result_schema,
    result_vertices_table,
    [ result_column_expressions [ , ... ] ],
    [ result_vertices_indexes [ , ... ] ]
)
```

| **Argument**                | **Data Type**       | **Description**                                                                                       |
| --------------------------- | ------------------- | ----------------------------------------------------------------------------------------------------- |
| `connection`                | pyocient.Connection | An active database connection using the `pyocient` module.                                            |
| `input_schema`              | str                 | A non-empty schema containing the input table.                                                        |
| `input_edges_table`         | str                 | Input edges table (must have the `srcid` and `destid` columns).                                       |
| `result_schema`             | str                 | A schema to create the result vertices table.                                                         |
| `result_vertices_table`     | str                 | Name of the result vertices table (always includes `id`).                                             |
| `result_column_expressions` | List\[str]          | Optional. Specify a list of SQL expressions that reference `ids.id` to add additional columns.        |
| `result_vertices_indexes`   | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none. |

**Example**
Create a vertices table from edge endpoints and add a `bucket` column that assigns each vertex to one of 10 buckets. Generate an index for the `id` and `bucket` columns.

```python Python theme={null}
from_edges(
    connection,
    "sales", "purchases",
    "sales", "customers_from_edges",
    ["ids.id % 10 AS bucket"],
    ["id","bucket"],
)
```

### join\_vertices

Merges two vertices tables by retaining every row from a primary table (`input_vertices_table`) and selectively updating rows that also appear in the modification table (`modification_vertices_table`). The merged table includes all vertices from the primary table that do not appear in the modification table.

For vertices that appear in both tables, the function must include a list of expressions (`resultAttributeExpressions`) in the same column order for every non-identifier column in the merged result table. These SQL expressions can add computations to columns, or simply add aliases if no changes are needed. Each expression can reference columns from the primary table (using alias `a`) or from the modification table (using alias `b`).

**Syntax**

```python Python theme={null}
join_vertices(
    connection,
    input_schema,
    input_vertices_table,
    modification_schema,
    modification_vertices_table,
    result_schema,
    result_vertices_table,
    result_attribute_expressions [ , ... ],
    [ result_vertices_indexes [ , ... ] ]
)
```

| **Argument**                   | **Data Type**       | **Description**                                                                                                                                                                                                                                                                                                                                                           |
| ------------------------------ | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`                   | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                                                                                                                                                                                |
| `input_schema`                 | str                 | A non-empty schema containing the primary input table.                                                                                                                                                                                                                                                                                                                    |
| `input_vertices_table`         | str                 | Input vertices table (with the alias `a`). This table must have an `id` column.                                                                                                                                                                                                                                                                                           |
| `modification_schema`          | str                 | A schema for the modification vertices table.                                                                                                                                                                                                                                                                                                                             |
| `modification_vertices_table`  | str                 | Modification vertices table (with the alias `b`). This table must have an `id` column.                                                                                                                                                                                                                                                                                    |
| `result_schema`                | str                 | A writable schema to create the result tables.                                                                                                                                                                                                                                                                                                                            |
| `result_vertices_table`        | str                 | Name of the vertices table to create.                                                                                                                                                                                                                                                                                                                                     |
| `result_attribute_expressions` | List\[str]          | A list of SQL expressions that define the non-identifier columns of the joined result.<br /><br />Each expression can reference the left (`input_vertices_table`) vertex as `a` and the right (`modification_vertices_table`) vertex as `b`.<br /><br />You must end every expression with an explicit alias using `AS alias_name` so the output column names are stable. |
| `result_vertices_indexes`      | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.                                                                                                                                                                                                                                                                     |

**Example**
Merge vertex attributes and generate indexes for the `id` and `status` columns. This example includes two SQL expressions to update the `status` and `score` columns based on the modification vertex table using the `COALESCE` SQL reference function.

```python Python theme={null}
join_vertices(
    connection,
    "sales", "customers",
    "sales", "customers_updates",
    "sales", "customers_merged",
    [
      "COALESCE(b.new_status, a.status) AS status",
      "COALESCE(b.score_delta + a.score, a.score) AS score",
    ],
    ["id","status"],
)
```

### inner\_join\_vertices

Performs an inner join on two vertex tables using an equality comparison `a.id = b.id`. The result table automatically includes the `id` column from the first table.

The function must include a list of SQL expressions (`result_attribute_expressions`) in the same column order for every non-identifier column in the merged result table. These SQL expressions can add computations to columns, or simply add aliases if no changes are needed. Each expression can reference columns from the primary table using the alias `a` or from the modification table using the alias `b`.

**Syntax**

```python Python theme={null}
inner_join_vertices(
    connection,
    input_schema,
    input_vertices_table,
    other_schema,
    other_vertices_table,
    result_schema,
    result_vertices_table,
    result_attribute_expressions [ , ... ],
    [ result_vertices_indexes [ , ... ] ]
)
```

| **Argument**                   | **Data Type**       | **Description**                                                                                                                                                                                                                                                                                                                                                           |
| ------------------------------ | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`                   | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                                                                                                                                                                                |
| `input_schema`                 | str                 | A schema for the left vertices table.                                                                                                                                                                                                                                                                                                                                     |
| `input_vertices_table`         | str                 | Input vertices table for the left side of the join (must have an `id` column).                                                                                                                                                                                                                                                                                            |
| `other_schema`                 | str                 | A schema for the right vertices table.                                                                                                                                                                                                                                                                                                                                    |
| `other_vertices_table`         | str                 | The other vertices table for the right side of the join (must have an `id` column).                                                                                                                                                                                                                                                                                       |
| `result_schema`                | str                 | A writable schema to create the result tables.                                                                                                                                                                                                                                                                                                                            |
| `result_vertices_table`        | str                 | Name of the vertices table to create.                                                                                                                                                                                                                                                                                                                                     |
| `result_attribute_expressions` | List\[str]          | A list of SQL expressions that define the non-identifier columns of the joined result.<br /><br />Each expression can reference the left (`input_vertices_table`) vertex as `a` and the right (`other_vertices_table`) vertex as `b`.<br /><br />You must end every expression with an explicit alias using `AS alias_name` to ensure the output column names are stable. |
| `result_vertices_indexes`      | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.                                                                                                                                                                                                                                                                     |

**Example**
Create an inner join between two vertex tables and generate an index on the `id` column.

```python Python theme={null}
inner_join_vertices(
    connection,
    "sales", "customers",
    "sales", "profiles",
    "sales", "customers_joined",
    [
      "a.id AS id",
      "a.status AS status",
      "b.tier AS tier",
    ],
    ["id"],
)
```

### outer\_join\_vertices

Performs a left outer join between two vertices tables using an equality comparison `a.id = b.id`. The result table includes all rows from the left table. For left-table rows that have no match in the right table, any expression that reads columns from the right table with the alias `b` evaluates to NULL (while expressions that only read the table with the alias `a` remain non-NULL as usual).

The method must include a list of SQL expressions (`result_attribute_expressions`) in the same column order for every non-identifier column in the merged result table. These SQL expressions can add computations to columns, or simply add aliases if no changes are needed. Each expression can reference columns from the primary table using the alias `a` or from the modification table using the alias `b`.

**Syntax**

```python Python theme={null}
outer_join_vertices(
    connection,
    input_schema,
    input_vertices_table,
    other_schema,
    other_vertices_table,
    result_schema,
    result_vertices_table,
    result_attribute_expressions [ , ... ],
    [ result_vertices_indexes [ , ... ] ]
)
```

| **Argument**                   | **Data Type**       | **Description**                                                                                                                                                                                                                                                                                                                                                           |
| ------------------------------ | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`                   | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                                                                                                                                                                                |
| `input_schema`                 | str                 | A schema for the left vertices table.                                                                                                                                                                                                                                                                                                                                     |
| `input_vertices_table`         | str                 | Left vertices table (must have an `id` column).                                                                                                                                                                                                                                                                                                                           |
| `other_schema`                 | str                 | A schema for the right vertices table (with the alias `b`).                                                                                                                                                                                                                                                                                                               |
| `other_vertices_table`         | str                 | Right vertices table (must have an `id` column).                                                                                                                                                                                                                                                                                                                          |
| `result_schema`                | str                 | A writable schema to create the result tables.                                                                                                                                                                                                                                                                                                                            |
| `result_vertices_table`        | str                 | Name of the vertices table to create.                                                                                                                                                                                                                                                                                                                                     |
| `result_attribute_expressions` | List\[str]          | A list of SQL expressions that define the non-identifier columns of the joined result.<br /><br />Each expression can reference the left (`input_vertices_table`) vertex as `a` and the right (`other_vertices_table`) vertex as `b`.<br /><br />You must end every expression with an explicit alias using `AS alias_name` to ensure the output column names are stable. |
| `result_vertices_indexes`      | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.                                                                                                                                                                                                                                                                     |

**Example**
Perform a left outer join on two vertices tables and generate an index on the `id` column.

```python Python theme={null}
outer_join_vertices(
    connection,
    "sales", "customers",
    "sales", "profiles",
    "sales", "customers_joined",
    [
      "a.id AS id",
      "a.status AS status",
      "b.tier AS tier",
    ],
    ["id"],
)
```

### collect\_neighbors

For each vertex in a table, this function collects information on neighbors (identifier and any attributes) as an array of tuples. For a specified direction (`IN`, `OUT`, or `BOTH`), the function aggregates tuples representing each neighboring vertex into an array.

The direction types are:

* `IN` — Neighbors with edges pointing to the vertex (edges where `destid = id`).
* `OUT` — Neighbors that the vertex points to (edges where `srcid = id`).
* `BOTH` — Union of `IN` and `OUT` with neighbors from incoming (`destid = id`) and outgoing (`srcid = id`) edges.

The result table has the columns `id` (the vertex identifier) and `neighbors` (an array of tuples representing each neighbor).

If an error occurs after table creation, the function drops the result table.

**Syntax**

```python Python theme={null}
collect_neighbors(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_table,
    direction,
    [ result_indexes [ , ... ] ]
)
```

| **Argument**           | **Data Type**       | **Description**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| ---------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `connection`           | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| `input_schema`         | str                 | A non-empty schema containing the input tables.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `input_vertices_table` | str                 | Input vertices table (must have an `id` column).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `input_edges_table`    | str                 | Input edges table (must have the `srcid` and `destid` columns).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `result_schema`        | str                 | A writable schema to create the result tables.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `result_table`         | str                 | Name of the neighbors collection table (with the `id` and `neighbors` columns).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `direction`            | EdgeDirection       | Specifies the direction of traversal. Supported values are:<br />`IN` — Neighbors that have edges pointing to the vertex (edges where `destid = id`).<br />For example, if an edge `5` points to `10`, then for `id=10`, neighbor `5` is included. <br /><br />`OUT` — Neighbors that the vertex points to (edges where `srcid = id`). <br />For example: If an edge `5` points to `10`, then for `id=5`, neighbor `10` is included.<br /><br />`BOTH` — The union of `IN` and `OUT`. This traversal includes neighbors from edges pointing to `id` and edges from `id`. |
| `result_indexes`       | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`).  Specify an empty list for none.                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |

**Example**
Collect incoming neighbors for each vertex and generate an index on the `id` column. The `direction` argument set to `IN` collects neighbors pointing to `id`.

```python Python theme={null}
collect_neighbors(
    connection,
    "sales", "customers", "purchases",
    "sales", "neighbors_in",
    EdgeDirection.IN,
    ["id"],
)
```

### collect\_edges

For each vertex in a table, this function collects an array of adjacent edge rows based on the specified direction. The result table has two columns: `id` (the vertex identifier) and `edges` (an array of tuples, each tuple containing all columns from the edges table for a connected edge).

The `direction` types are:

* `IN` — Edges pointing to the vertex (edges where `destid = id`).
* `OUT` — Edges originating from the vertex (edges where `srcid = id`).
* `BOTH` — Union of `IN` and `OUT` that includes edges from incoming (`destid = id`) and outgoing (`srcid = id`) directions. This direction retains duplicates.

If an error occurs after table creation, the function drops the result table.

**Syntax**

```python Python theme={null}
collect_edges(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_table,
    direction,
    [ result_indexes [ , ... ] ]
)
```

| **Argument**           | **Data Type**       | **Description**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| ---------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `connection`           | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `input_schema`         | str                 | A non-empty schema containing the input tables.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `input_vertices_table` | str                 | Input vertices table (must have an `id` column).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| `input_edges_table`    | str                 | Input edges table (must have the `srcid` and `destid` columns).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `result_schema`        | str                 | A writable schema to create the result tables.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `result_table`         | str                 | Name of the edge collection table (with the `id` and `edges` columns).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| `direction`            | EdgeDirection       | Specifies the direction of traversal. Supported values are:<br />`IN` — Neighbors that have edges pointing to the vertex (edges where `destid = id`).<br />For example, if an edge `5` points to `10`, then for `id=10`, neighbor `5` is included. <br /><br />`OUT` — Neighbors that the vertex points to (edges where `srcid = id`). For example, if an edge `5` points to `10`, then for `id=5`, neighbor `10` is included.<br /><br />`BOTH` — The union of `IN` and `OUT`. This traversal includes neighbors from edges pointing to `id` and edges from `id`. |
| `result_indexes`       | List\[str]          | Optional. Columns to index. Specify an empty list for none.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |

**Example**
Collect outgoing edges per vertex. The example sets the `direction` to `OUT` to collect edges from `id`.

```python Python theme={null}
collect_edges(
    connection,
    "sales", "customers", "purchases",
    "sales", "outgoing_edges",
    EdgeDirection.OUT,
    ["id"],
)
```

## Algorithms

High-level graph algorithms that iterate over the graph structure to produce labels, components, or counts.

### label\_propagation

Executes the [Label Propagation Algorithm](https://en.wikipedia.org/wiki/Label_propagation_algorithm) (LPA) to assign community labels to vertices.

Each vertex starts with its own identifier as its label. For a number set by the `maxIterations` argument, each vertex updates its label to the most frequent label among its neighbors. The algorithm determines ties by choosing the smallest label. The algorithm uses temporary tables for intermediate results and drops these tables when the process completes or if it fails. Isolated vertices retain their initial label. The final table stores `id` and `label` columns and can include indexes.

**Syntax**

```python Python theme={null}
label_propagation(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_vertices_table,
    max_iterations,
    result_vertices_indexes,
    [ result_vertices_indexes [ , ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                       |
| ------------------------- | ------------------- | ----------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                            |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                                       |
| `input_vertices_table`    | str                 | Input vertices table (must have an `id` column).                                                      |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).                                       |
| `result_schema`           | str                 | A writable schema to create the result tables.                                                        |
| `result_vertices_table`   | str                 | Name of the vertices table to create (columns include `id` and `label`).                              |
| `max_iterations`          | int                 | Maximum number of iterations (must be `1` or greater).                                                |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none. |

**Example**
Run label propagation for 10 iterations and assign labels to vertices. Generate an index on the `id` column.

```python Python theme={null}
label_propagation(
    connection,
    "sales", "customers", "purchases",
    "sales", "lpa_labels",
    10,
    ["id"],
)
```

### connected\_components \[#connected\_components]

Identifies the connected components of an undirected graph. This algorithm configures a Pregel computation in which each vertex initially sets its component label equal to its own identifier `id`.

In each iteration, vertices send their component label to neighbors. Each vertex updates based on the aggregated minimum value of its current component label and any received values. The process repeats until no more updates occur.

The result table maps each vertex `id` to its final `component` label.

**Syntax**

```python Python theme={null}
connected_components(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_vertices_table,
    [ result_vertices_indexes [ , ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                       |
| ------------------------- | ------------------- | ----------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                            |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                                       |
| `input_vertices_table`    | str                 | Input vertices table (must have an `id` column).                                                      |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).                                       |
| `result_schema`           | str                 | A writable schema to create the result tables.                                                        |
| `result_vertices_table`   | str                 | Name of the vertices table to create.                                                                 |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none. |

**Example**
Compute connected components for a maximum of 20 iterations and generate an index on the `id` column.

```python Python theme={null}
connected_components(
    connection,
    "sales", "customers", "purchases",
    "sales", "components",
    20,
   ["id"],
)
```

### strongly\_connected\_components

Computes strongly connected components (SCC) in a directed graph. This function runs a recursive algorithm that partitions vertices into subsets where every vertex is reachable from other vertices in the same subset.

This function uses recursive partitioning. The algorithm selects a pivot (typically the minimum identifier `id`), computes its predecessor set (vertices that can reach the pivot), and its descendant sets (vertices reachable from the pivot). Then, the function identifies the SCC as their intersection, removes that SCC from the graph, and recurses on the remainder until all vertices have been assigned to an SCC. The output contains columns for the `id` and `component` identifiers (the minimum `id` in the SCC).

The function creates temporary tables in the result schema to store intermediate results. This function drops these tables when the computation completes or fails. The final result table contains two columns: `id` (vertex identifier) and `component` (the minimum vertex identifier in its SCC subset).

**Syntax**

```python Python theme={null}
strongly_connected_components(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_vertices_table,
    [ result_vertices_indexes [ , ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                       |
| ------------------------- | ------------------- | ----------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                            |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                                       |
| `input_vertices_table`    | str                 | Input vertices table (must have an `id` column).                                                      |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).                                       |
| `result_schema`           | str                 | A writable schema to create the result tables.                                                        |
| `result_vertices_table`   | str                 | Name of the vertices table to create (columns include `id` and `component`).                          |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none. |

**Example**
Compute the SCC and generate an index on the `id` column.

```python Python theme={null}
strongly_connected_components(
    connection,
    "sales", "customers", "purchases",
    "sales", "scc",
    ["id"],
)
```

### TriangleCount

TriangleCount identifies all 3-cycles (triangles) in the graph and counts how many distinct triangles each vertex participates in.

The algorithm first builds a canonical, undirected edge set by ensuring `srcid < destid` and removing duplicates to prevent double-counting. If your input edges are already canonicalized and deduplicated, use `TriangleCount.run_pre_canonicalized` to skip preprocessing for faster performance.

The function then counts triangles (`a`, `b`, `c`) where `a < b < c` by intersecting neighbor lists and aggregates per-vertex participation to produce a result table with the `id` and `triangle_count` columns.

`run` **syntax**

```python Python theme={null}
TriangleCount.run(
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_vertices_table,
    [ result_vertices_indexes [ , ... ] ],
)
```

`run_pre_canonicalized` **syntax**

```python Python theme={null}
TriangleCount.run_pre_canonicalized(
    connection,
    input_schema,
    input_vertices_table,
    canonical_edges_schema,
    canonical_edges_table,
    result_schema,
    result_vertices_table,
    [ result_vertices_indexes [ , ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                                                                                                                                     |
| ------------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                          |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                                                                                                                                                     |
| `input_vertices_table`    | str                 | Input vertices table (must have an `id` column).                                                                                                                                                                    |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).<br /><br />For `runPreCanonicalized `execution, the function assumes this table is already canonicalized (e.g., `srcid < destid`) and deduplicated. |
| `result_schema`           | str                 | A writable schema to create the result tables.                                                                                                                                                                      |
| `result_vertices_table`   | str                 | Name of the vertices table to create (columns include `id` and `triangle_count`).                                                                                                                                   |
| `result_vertices_indexes` | list\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.                                                                                                               |

**Examples**

**Count Triangles Using** `run`

Canonicalize the raw edges internally, count unique triangles, and write per-vertex triangle counts with an index on the `id` column.

```python Python theme={null}
TriangleCount.run(
    connection,
    "sales",
    "customers",
    "purchases",
    "sales",
    "triangle_counts",
    ["id"],
)
```

**Count Triangles Using** `run_pre_canonicalized`

Use a pre-canonicalized, deduplicated edge table to count triangles and write per-vertex triangle counts with an index on the `id` column.

```python Python theme={null}
TriangleCount.run_pre_canonicalized(
    connection,
    "sales",
    "customers",
    "sales",
    "purchases_canonical",
    "sales",
    "triangle_counts",
    ["id"],
)
```

### pregel

Provides a generic vertex‑centered iteration framework for custom graph algorithms, similar to the [Pregel model](https://research.google/pubs/pregel-a-system-for-large-scale-graph-processing/).

Each iteration updates vertex states by sending messages along edges and then aggregating these messages to compute new states. The algorithm continues iterating until it reaches convergence (no state changes or no messages produced) or a specified iteration cap.

The algorithm uses multiple specified SQL expressions.

**Syntax**

```python Python theme={null}
pregel(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_vertices_table,
    initializer_expr,
    send_to_source_expr,
    send_to_dest_expr,
    aggregate_expr,
    updater_expr,
    max_iterations,
    [ result_vertices_indexes [ , ... ] ]
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                                                                                                                                                                                                                                                                      |
| ------------------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                                                                                                                                                           |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                                                                                                                                                                                                                                                                                      |
| `input_vertices_table`    | str                 | Input vertices table (must have an `id` column).                                                                                                                                                                                                                                                                                                     |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).                                                                                                                                                                                                                                                                                      |
| `result_schema`           | str                 | A writable schema to create the result tables.                                                                                                                                                                                                                                                                                                       |
| `result_vertices_table`   | str                 | Name of the vertices table to create (columns include `id` and `result`).                                                                                                                                                                                                                                                                            |
| `initializer_expr`        | str                 | A SQL expression to compute the initial state for each vertex (e.g., `CASE WHEN type = 'seed' THEN 1.0 ELSE 0.0 END`).                                                                                                                                                                                                                               |
| `send_to_source_expr`     | str                 | Optional. Defines the message sent to the source vertex of an edge, referencing the current vertex states `a.state` (source) and `c.state` (destination), and any edge attributes in `b`.<br />Example: `CASE WHEN a.country = c.country THEN 1 ELSE 0 END`                                                                                          |
| `send_to_dest_expr`       | str                 | Optional. Defines the message sent to the destination vertex of an edge, referencing the current vertex states `a.state` (source) and `c.state` (destination), and any edge attributes as `b.<edge_column>`. <br />Example: `CASE WHEN a.country = c.country THEN 1 ELSE 0 END`                                                                      |
| `aggregate_expr`          | str                 | A SQL aggregation to combine messages per vertex (e.g., `SUM(msg)`).<br />For a list of supported aggregations, see [Aggregate Functions](/aggregate-functions).                                                                                                                                                                                     |
| `updater_expr`            | str                 | A SQL expression to compute the next state from the current state and aggregated messages. <br /><br />For example, this code updates the state to the minimum current state and the aggregated message (similar to the [connected\_components](#connected_components) function): `LEAST(a.state, COALESCE(m.aggregated_message, a.state)) AS state` |
| `max_iterations`          | int                 | Maximum number of iterations (must be either `-1` or a value of `1` or greater). <br /><br />If this value is `-1`, the Pregel algorithm has no limit, and it runs until convergence. <br /><br />A value of `0` causes the algorithm to return an error.                                                                                            |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.                                                                                                                                                                                                                                                |

**Example**
Run a simple Pregel computation summing incoming edge amounts into the vertex state for 10 iterations at most, and generate an index on the `id` column.

```python Python theme={null}
pregel(
    connection,
    "sales", "customers", "purchases",
    "sales", "pregel_result",
    "0 AS state",
    "b.amount",
    None,
    "SUM(msg) AS aggregated_message",
    "state + COALESCE(aggregated_message, 0) AS state",
    10,
    ["id"],
)
```

## Paths & Ranking

These functions include the shortest-path and PageRank algorithms.

### shortest\_paths

Computes the shortest distance from every vertex to each set of landmark vertices using an iterative relaxation algorithm. The algorithm resembles [Bellman–Ford](https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm) but simultaneously handles multiple destinations.

Each landmark starts at distance `0` and all others at positive infinity. On each iteration, the algorithm examines every edge and checks whether traveling through the connected neighbor would yield a shorter route to a landmark. If a shorter route exists, the algorithm updates the distance of the source vertex. The process stops when no distances improve or the algorithm reaches the maximum number of iterations.

After the process finishes, the algorithm writes a result table with the `srcid`, `destid`, and `distance` columns.

**Syntax**

```python Python theme={null}
shortest_paths(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_table,
    landmarks[ , ... ],
    edge_weight_column,
    max_iterations,
    [ result_vertices_indexes [ , ... ] ]
)
```

| **Argument**           | **Data Type**       | **Description**                                                                                                                                     |
| ---------------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`           | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                          |
| `input_schema`         | str                 | A non-empty schema containing the input tables.                                                                                                     |
| `input_vertices_table` | str                 | Input vertices table (must have an `id` column).                                                                                                    |
| `input_edges_table`    | str                 | Input edges table (must have the `srcid` and `destid` columns).                                                                                     |
| `result_schema`        | str                 | A writable schema to create the result table.                                                                                                       |
| `result_table`         | str                 | Name of the distances result table. This table has the `srcid`, `destid`, and `distance` columns.                                                   |
| `landmarks`            | List\[int]          | One or more landmark vertex identifiers. This list must be non-empty and contain no NULLs.                                                          |
| `edge_weight_column`   | Optional\[str]      | Optional. The name of an edge weight column. <br />If you do not specify this argument, the default value is `1.0`.                                 |
| `max_iterations`       | int                 | Maximum number of relaxation iterations. This value must be `1` or greater.                                                                         |
| `result_indexes`       | List\[str]          | Optional. The list of columns to index in the result table (`srcid`, `destid`, and `distance` columns). <br /><br />Specify an empty list for none. |

**Example**
Compute distances from landmarks and generate indexes on the `src` and `dest` columns.

```python Python theme={null}
shortest_paths(
    connection,
    "sales", "customers", "purchases",
    "sales", "distances",
    [1, 42],
    None,
    10,
    ["srcid","destid"],
)
```

### static\_page\_rank

Computes [PageRank](https://en.wikipedia.org/wiki/PageRank) scores for each vertex over a fixed number of iterations.

The algorithm follows the standard PageRank formula with a reset probability (`reset_prob`) and uses common table expressions to calculate contributions from incoming edges and redistribute ranks from dangling nodes.

The algorithm supports two variants:

* Standard PageRank — All vertices start with rank `1.0/N`, where `N` is the number of vertices. Specify this variant if `personalizationSrcId` is `null`.
* Personalized PageRank — The specified vertex starts with a rank of `1.0`, while others start with a rank of `0.0`. Specify this variant if `personalizationSrcId` is a vertex identifier.

After running PageRank for a fixed number of iterations, the function writes a result vertices table containing all original vertex columns with a new PageRank scoring column.

**Syntax**

```python Python theme={null}
static_page_rank(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_vertices_table,
    num_iterations,
    reset_prob,
    [ result_vertices_indexes [ , ... ] ],
    personalization_src_id,
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| ------------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| `input_vertices_table`    | str                 | Input vertices table (must have an `id` column).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| `result_schema`           | str                 | A writable schema to create the result tables.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| `result_vertices_table`   | str                 | Name of the vertices table to create (columns include all vertex columns and a `pagerank` column).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| `num_iterations`          | int                 | Number of iterations to run. This value must be 1 or greater.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| `reset_prob`              | float               | A value between `0.0` and `1.0` that controls the damping factor for how PageRank random surfer moves across vertices. <br /><br />A high value (e.g., `0.85`) puts more emphasis on link structure, encouraging the algorithm to move from each vertex to a neighbor. This value means PageRank scores tend to concentrate around well-linked regions.<br /><br />A lower score (e.g., `0.50`) allows the algorithm to ignore edges and instead jump to a vertex chosen from a base distribution. For standard PageRank, this base is uniform over all vertices. For personalized PageRank, the base is biased toward the specified vertex. This behavior creates more uniform scoring with less sensitivity to link topology. |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `personalization_src_id`  | Optional\[int]      | Determines whether PageRank uses the standard or the personalized variant.  <br /><br />For personalized mode, specify a vertex identifier. PageRank scoring starts with this vertex set at `1.0`, while all other vertices start at `0.0`. <br /><br />For standard mode, specify `None`.  All vertices start with rank `1.0/N`, where `N` is the number of vertices.                                                                                                                                                                                                                                                                                                                                                          |

**Example**
Run fixed-iteration PageRank and generate an index on the `id` column. This example uses a high reset probability `reset_prob` of `0.85` to ensure the ranking concentrates on highly linked regions.

```python Python theme={null}
static_page_rank(
    connection,
    "sales", "customers", "purchases",
    "sales", "pagerank_static",
    10,
    0.85,
    ["id"],
    None,
)
```

### dynamic\_page\_rank

Computes PageRank scores until convergence based on a specified threshold value (`tolerance`). Unlike the [static\_page\_rank](#static_page_rank) function, this algorithm runs iterations until the sum of absolute differences between ranks in successive iterations is less than or equal to the `tolerance` value. The algorithm handles personalization similarly to `static_page_rank`. At each iteration, the function uses the PageRank formula, collects rank values, and redistributes them.

The algorithm supports two variants:

* Standard PageRank — All vertices start with rank `1.0/N`, where `N` is the number of vertices. Specify this variant if `personalizationSrcId` is `null`.
* Personalized PageRank — The specified vertex starts with a rank of `1.0`, while others start with a rank of `0.0`. Specify this variant if `personalizationSrcId` is a vertex identifier.

After running PageRank until the system reaches the `tolerance` threshold, the function writes a vertices table containing all the original vertex columns with a new PageRank scoring column.

**Syntax**

```python Python theme={null}
dynamic_page_rank(
    connection,
    input_schema,
    input_vertices_table,
    input_edges_table,
    result_schema,
    result_vertices_table,
    tolerance,
    reset_prob,
    [ result_vertices_indexes [ , ... ] ],
    personalization_src_id,
)
```

| **Argument**              | **Data Type**       | **Description**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| ------------------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connection`              | pyocient.Connection | An active database connection using the `pyocient` module.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| `input_schema`            | str                 | A non-empty schema containing the input tables.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `input_vertices_table`    | str                 | Input vertices table (must have an `id` column).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `input_edges_table`       | str                 | Input edges table (must have the `srcid` and `destid` columns).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `result_schema`           | str                 | A writable schema to create the result tables.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| `result_vertices_table`   | str                 | Name of the vertices table to create (columns include all vertex columns and `pagerank`).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| `tolerance`               | float               | The convergence threshold that stops iterations when the rank changes become negligible. <br /><br />After each iteration, the algorithm measures the maximum change in any rank of a vertex. If that change is below the specified tolerance, the algorithm considers the computation converged and stops early. <br /><br />A high tolerance value (e.g., `0.001`) is good for quick exploratory runs that generate an approximate ranking. <br /><br />A low tolerance value (e.g., `0.000001`) generates a precise ranking, but requires a higher compute cost.                                                                                                                                                                      |
| `reset_prob`              | float               | A value between `0.0` and `1.0` that controls the damping factor for how the PageRank random surfer moves across vertices. <br /><br />A high value (e.g., `0.85`) puts more emphasis on link structure, encouraging the algorithm to move from each vertex to a neighbor.  The high value means PageRank scores tend to concentrate around well-linked regions.<br /><br />A lower score (e.g., `0.50`) allows the algorithm to ignore edges and instead jump to a vertex chosen from a base distribution. For standard PageRank, this base is uniform over all vertices. For personalized PageRank, the base is biased toward the specified vertex. This behavior creates more uniform scoring with less sensitivity to link topology. |
| `result_vertices_indexes` | List\[str]          | Optional. Columns to index in the result vertices table (e.g., `id`). Specify an empty list for none.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `personalization_src_id`  | Optional\[int]      | Determines whether PageRank uses the standard or the personalized variant.  <br /><br />For personalized mode, specify a vertex identifier. PageRank scoring starts with this vertex set at `1.0`, while all other vertices start at `0.0`. <br /><br />For standard mode, specify `None`. All vertices start with rank `1.0/N`, where `N` is the number of vertices.                                                                                                                                                                                                                                                                                                                                                                    |

**Example**
Run dynamic PageRank to convergence and generate an index on the `id` column. This example uses a low `tolerance` value of `1.0e-6`, which generates high-precision rankings but requires more computing resources.

```python Python theme={null}
dynamic_page_rank(
    connection,
    "sales", "customers", "purchases",
    "sales", "pagerank_dynamic",
    1.0e-6,
    0.85,
    ["id"],
    None,
)
```

## Bibliography

Pregel: A System for Large-Scale Graph Processing.” Accessed November 18, 2025. [https://research.google/pubs/pregel-a-system-for-large-scale-graph-processing/](https://research.google/pubs/pregel-a-system-for-large-scale-graph-processing/).

## Related Links

[OCGraph Java Library](/ocgraph-java-library)

[Ocient Python Module: pyocient](/ocient-python-module-pyocient)
