ocient_graph module brings a programming model (similar to GraphX) to the System directly from using the pyocient driver. The module treats a graph as two relational tables, one for vertices (nodes) and one for edges (directed links). This module provides a composable API for graph transformations, neighborhood analytics, and iterative algorithms (e.g., Pregel, PageRank).
The API validates inputs, avoids destructive changes by materializing results into new tables, supports optional indexing for performance, and follows Ocient SQL conventions. The package installs separately from pyocient and exposes a Python-native interface that mirrors the library. For details, see OCGraph Java Library.
Installation
Usepyocient for connectivity and ocient_graph for graph APIs. The graph library is a separate package that depends on pyocient. For a tutorial about installing and using pyocient, see Ocient Python Module: pyocient.
Install and Import
Install the ocient_graph module.
Shell
Python
Data Model Requirements
Database tables that use the OCGraph Python library must adhere to this structure. In addition to the listed requirements, tables can include other columns.| Table | Description | Requirements |
|---|---|---|
| Vertices table | A table with one row per vertex (node). This table typically represents the anchor for graph algorithms and transforms. Many methods join edges to vertices by the id column. | The table must contain theid BIGINT NOT NULL column definition as the unique vertex identifier. |
| Edges table | A table with one row per directed edge (relationship). Each row is a directed edge from a source vertex to a destination vertex. | The table must contain these column definitions:srcid BIGINT NOT NULL,destid BIGINT NOT NULL |
Subgraph and Filtering
Use a subgraph or various filters to restrict a graph to relevant vertices and edges. These functions create filtered copies or masked intersections, preserving schema and optional indexes for performance.subgraph
Creates filtered vertex and edge tables using vertex and triplet predicates, retaining only edges with endpoints that remain after vertex filtering. The function creates the requested indexes and performs best-effort cleanup in the event of failure. SyntaxPython
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the filtered vertices table to create. |
result_edges_table | str | Name of the filtered edges table to create. |
vertex_filter | str | SQL predicate to filter the vertices (without the WHERE keyword). Example: status = 'ACTIVE' AND score > 0 |
edge_filter | str | SQL predicate that the system evaluates in a triplet context using the aliases a (source vertex), b (edge), and c (destination vertex). This predicate does not require a WHERE keyword. Example: b.amount > 50 AND a.region = c.region |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
result_edges_indexes | List[str] | Optional. Columns to index in the result edges table (e.g., srcid and destid columns). Specify an empty list for none. |
Python
filter_vertices
Creates a filtered subgraph by selecting vertices that match a predicate while retaining only edges with endpoints that are in the filtered vertex set. SyntaxPython
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the result vertices table. The name must not conflict with the names of input tables. |
result_edges_table | str | Name of the edges table to create. The name must not conflict with the names of input tables. |
vertex_filter | str | SQL predicate to filter the vertices (without the WHERE keyword). Example: status = 'ACTIVE' AND score > 0 |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
result_edges_indexes | List[str] | Optional. Columns to index in the result edges table (e.g., srcid and destid columns). Specify an empty list for none. |
Python
filter_edges
Creates a filtered edges table by selecting edges that match a predicate. SyntaxPython
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty achema containing the input edges table. |
input_edges_table | str | Input edges table (must have srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_edges_table | str | Result edges table name. The name must not conflict with the names of input tables. |
edge_filter | str | SQL predicate on edges (without the WHERE keyword). Example: weight > 0.5 AND type = 'ACTIVE' |
result_edges_indexes | List[str] | Optional. Columns to index in the result edges table (e.g., srcid and destid columns). Specify an empty list for none. |
purchases edge set by keeping only edges that meet a business rule (weight > 0.5 and ACTIVE). Then, the filterEdges method indexes the result on the srcid and destid columns for faster lookups.
Python
mask
Creates a masked subgraph by intersecting two graphs. Vertices intersect if the vertex identifier is present in both graphs. Edges intersect when thesrcid and destid values are present in both graphs.
The function creates a masked subgraph from rows that intersect with each other. The function copies rows that intersect from the graph defined by the arguments input_vertices_table and input_edges_table, including any attributes.
You can optionally create indexes on the result subgraph tables.
Syntax
Java
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | The primary vertices table to intersect (must have an id column).The masked subgraph created by this function copies rows that intersect from this table. |
input_edges_table | str | The primary edges table to intersect (must have srcid and destid columns).The masked subgraph created by this function copies rows that intersect from this table. |
other_schema | str | Schema containing the second graph. |
other_vertices_table | str | The second vertices table to intersect. |
other_edges_table | str | The second edges table to intersect. |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create. |
result_edges_table | str | Name of the edges table to create. |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
result_edges_indexes | List[str] | Optional. Columns to index in the result edges table (e.g., srcid and destid columns). Specify an empty list for none. |
Java
Transformations
Construct new vertex or edge tables by computing derived columns, reversing direction, or aggregating duplicates. These functions do not change the original inputs. Instead, the functions materialize new results.map_vertices
Creates a new vertices table with the identifierid and computed columns. Use the result_column_expressions argument to calculate additional columns. This function can also add indexes before inserting data.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input vertices table. |
input_vertices_table | str | Input vertices table (must have an id column). |
result_schema | str | A schema to create the result vertices table. |
result_vertices_table | str | Name of the result vertices table. The name must not conflict with the names of input tables. |
result_column_expressions | List[str] | One or more SQL expressions defining result columns beyond id. Use the AS alias_name keyword for stable names. |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
name_upper and is_vip, and generate indexes for the id and name_upper columns.
Python
map_edges
Creates a new edges table withsrcid, destid, and any additional computed columns. Expressions should refer to input edge columns by their original names, and each computed expression should include an AS alias keyword.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input edges table. |
input_edges_table | str | Input edges table (must have srcid and destid columns). |
result_schema | str | A schema to create the result edges table. |
result_edges_table | str | Name of the result edges table. The name must not conflict with the names of input tables. |
result_column_expressions | List[str] | Optional. SQL expressions for additional edge columns (besides the srcid and destid columns). Use the AS alias_name keyword for stable names. |
result_edges_indexes | List[str] | Optional. Columns to index in the result edges table (e.g., srcid and destid columns). Specify an empty list for none. |
discounted_amount and big_txn, and generate indexes for the srcid and destid columns.
Python
map_triplets
Creates a new edges table with computed columns that referencea (source vertex), b (edge), and c (destination vertex). The output automatically includes b.srcid and b.destid columns.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). Used as the a and c vertices. |
input_edges_table | str | Input edges table (must have the srcid and destid columns). This function uses this argument as the b edge. |
result_schema | str | A writable schema to create the result edges table. |
result_edges_table | str | Name of the result edges table. The name must not conflict with the names of input tables. |
result_column_expressions | List[str] | Optional. SQL expressions that can reference a (source vertex), b (edge), and c (destination vertex). Use the AS alias_name keyword for stable names. |
result_edges_indexes | List[str] | Optional. Columns to index in the result edges table (e.g., srcid and destid columns). Specify an empty list for none. |
amount and same_country columns, and generate indexes for the src and destid columns.
Python
reverse_edges
Creates a new edges table with thesrcid and destid columns reversed, preserving other columns. Use this function to traverse a graph in the opposite direction.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input edges table. |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the reversed edges table. |
result_edges_table | str | Name of the reversed edges table to create. The name must not conflict with the names of input tables. |
result_edges_indexes | List[str] | Optional. Columns to index in the result edges table (e.g., srcid and destid columns). Specify an empty list for none. |
srcid and destid columns. The example also creates indexes for these columns.
Python
group_edges
Groups duplicate rows of thesrcid and destid columns, producing one row for each unique pair of values in a new edges table. This function performs aggregations based on one or more SQL expressions.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input edges table. |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the grouped tables. |
result_edges_table | str | Name of the grouped edges table. The name must not conflict with the names of input tables. |
result_column_expressions | List[str] | One or more aggregate expressions for the srcid and destid columns. For details about SQL aggregations, see Aggregate Functions. Use the AS alias_name keyword for stable names. |
result_edges_indexes | List[str] | Optional. Columns to index in the result edges table (e.g., srcid and destid columns). Specify an empty list for none. |
txn_count and total sums total_amount. Also, this function generates indexes for the src and destid columns.
Python
Triplets
Produce triplet representations that are made ofa (source vertex), b (edge), and c (destination vertex), either as a logical view or a materialized table for downstream queries.
create_triplets_view
Creates a view that combines the edge table with the source and destination vertex attributes. This view is useful for analyzing relationships without having to repeatedly join tables. The view includes these columns:- All original edge columns (including the
srcidanddestidcolumns). - All source-vertex columns except
id. Source-vertex column names have thesrc_prefix. - All destination-vertex columns except
id. Destination-vertex column names have thedest_prefix.
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input vertices and edges tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A schema to create the view. |
result_triplets_view | str | Name of the triplets view to create. The name must not conflict with the names of input tables. |
Python
create_triplets_table
Creates a materialized table that combines the edge table with the source and destination vertex attributes. This table is useful for analyzing relationships without having to repeatedly join tables. The created table includes these columns:- All original edge columns (including the
srcidanddestidcolumns). - All source-vertex columns except
id. Source-vertex column names have thesrc_prefix. - All destination-vertex columns except
id. Destination-vertex column names have thedest_prefix.
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A schema to create the triplets table. |
result_triplets_table | str | Name of the triplets table to create. The name must not conflict with any input names. |
result_triplets_indexes | List[str] | Optional. Columns to index in the result table. Specify an empty list for none. |
src_id and dest_id columns.
Python
Degrees
Compute degree metrics for each vertex from the edges table. These functions produce small vertex tables suitable for joins and analytics.in_degrees
Computes how many edges point to each vertex in an edge table by counting how many times each uniquedestid value appears. The result table has two columns: id (the destination vertex) and in_degree (the count).
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input table. |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A schema to create the in-degree table. |
result_vertices_table | str | Name of the vertices table to create (columns include id and in_degree). |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id column.
Python
out_degrees
Computes how many edges originate from each vertex in an edge table by counting how many times each uniquesrcid value appears. The result table has two columns: id (the source vertex) and out_degree (the count).
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input table. |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A schema to create the out-degree table. |
result_vertices_table | str | Name of the vertices table to create (columns include id and out_degree). |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id column.
Python
degrees
Computes the total degrees (in-degrees and out-degrees) for each vertex in an edge table by counting how many times each uniquesrcid and destid value appears. The result table has two columns: id (the destination or source vertex) and degree (the count).
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input table. |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A schema to create the degree table. |
result_vertices_table | str | Name of the vertices table to create (columns include id and out_degree). |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id column.
Python
Vertex Extraction and Joins
Build vertex sets from edges and combine vertex attributes across tables. These functions are useful for shaping vertex properties and consolidating features.from_edges
Builds a vertices table from an edges table by extracting the unique source and destination identifiers. This function can optionally compute additional columns using SQL expressions by referencing the unique identifier asids.id.
The created table always contains the id column with one additional column per expression.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input table. |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A schema to create the result vertices table. |
result_vertices_table | str | Name of the result vertices table (always includes id). |
result_column_expressions | List[str] | Optional. Specify a list of SQL expressions that reference ids.id to add additional columns. |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
bucket column that assigns each vertex to one of 10 buckets. Generate an index for the id and bucket columns.
Python
join_vertices
Merges two vertices tables by retaining every row from a primary table (input_vertices_table) and selectively updating rows that also appear in the modification table (modification_vertices_table). The merged table includes all vertices from the primary table that do not appear in the modification table.
For vertices that appear in both tables, the function must include a list of expressions (resultAttributeExpressions) in the same column order for every non-identifier column in the merged result table. These SQL expressions can add computations to columns, or simply add aliases if no changes are needed. Each expression can reference columns from the primary table (using alias a) or from the modification table (using alias b).
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the primary input table. |
input_vertices_table | str | Input vertices table (with the alias a). This table must have an id column. |
modification_schema | str | A schema for the modification vertices table. |
modification_vertices_table | str | Modification vertices table (with the alias b). This table must have an id column. |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create. |
result_attribute_expressions | List[str] | A list of SQL expressions that define the non-identifier columns of the joined result. Each expression can reference the left ( input_vertices_table) vertex as a and the right (modification_vertices_table) vertex as b.You must end every expression with an explicit alias using AS alias_name so the output column names are stable. |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id and status columns. This example includes two SQL expressions to update the status and score columns based on the modification vertex table using the COALESCE SQL reference function.
Python
inner_join_vertices
Performs an inner join on two vertex tables using an equality comparisona.id = b.id. The result table automatically includes the id column from the first table.
The function must include a list of SQL expressions (result_attribute_expressions) in the same column order for every non-identifier column in the merged result table. These SQL expressions can add computations to columns, or simply add aliases if no changes are needed. Each expression can reference columns from the primary table using the alias a or from the modification table using the alias b.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A schema for the left vertices table. |
input_vertices_table | str | Input vertices table for the left side of the join (must have an id column). |
other_schema | str | A schema for the right vertices table. |
other_vertices_table | str | The other vertices table for the right side of the join (must have an id column). |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create. |
result_attribute_expressions | List[str] | A list of SQL expressions that define the non-identifier columns of the joined result. Each expression can reference the left ( input_vertices_table) vertex as a and the right (other_vertices_table) vertex as b.You must end every expression with an explicit alias using AS alias_name to ensure the output column names are stable. |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id column.
Python
outer_join_vertices
Performs a left outer join between two vertices tables using an equality comparisona.id = b.id. The result table includes all rows from the left table. For left-table rows that have no match in the right table, any expression that reads columns from the right table with the alias b evaluates to NULL (while expressions that only read the table with the alias a remain non-NULL as usual).
The method must include a list of SQL expressions (result_attribute_expressions) in the same column order for every non-identifier column in the merged result table. These SQL expressions can add computations to columns, or simply add aliases if no changes are needed. Each expression can reference columns from the primary table using the alias a or from the modification table using the alias b.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A schema for the left vertices table. |
input_vertices_table | str | Left vertices table (must have an id column). |
other_schema | str | A schema for the right vertices table (with the alias b). |
other_vertices_table | str | Right vertices table (must have an id column). |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create. |
result_attribute_expressions | List[str] | A list of SQL expressions that define the non-identifier columns of the joined result. Each expression can reference the left ( input_vertices_table) vertex as a and the right (other_vertices_table) vertex as b.You must end every expression with an explicit alias using AS alias_name to ensure the output column names are stable. |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id column.
Python
collect_neighbors
For each vertex in a table, this function collects information on neighbors (identifier and any attributes) as an array of tuples. For a specified direction (IN, OUT, or BOTH), the function aggregates tuples representing each neighboring vertex into an array.
The direction types are:
IN— Neighbors with edges pointing to the vertex (edges wheredestid = id).OUT— Neighbors that the vertex points to (edges wheresrcid = id).BOTH— Union ofINandOUTwith neighbors from incoming (destid = id) and outgoing (srcid = id) edges.
id (the vertex identifier) and neighbors (an array of tuples representing each neighbor).
If an error occurs after table creation, the function drops the result table.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_table | str | Name of the neighbors collection table (with the id and neighbors columns). |
direction | EdgeDirection | Specifies the direction of traversal. Supported values are:IN — Neighbors that have edges pointing to the vertex (edges where destid = id).For example, if an edge 5 points to 10, then for id=10, neighbor 5 is included. OUT — Neighbors that the vertex points to (edges where srcid = id). For example: If an edge 5 points to 10, then for id=5, neighbor 10 is included.BOTH — The union of IN and OUT. This traversal includes neighbors from edges pointing to id and edges from id. |
result_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id column. The direction argument set to IN collects neighbors pointing to id.
Python
collect_edges
For each vertex in a table, this function collects an array of adjacent edge rows based on the specified direction. The result table has two columns:id (the vertex identifier) and edges (an array of tuples, each tuple containing all columns from the edges table for a connected edge).
The direction types are:
IN— Edges pointing to the vertex (edges wheredestid = id).OUT— Edges originating from the vertex (edges wheresrcid = id).BOTH— Union ofINandOUTthat includes edges from incoming (destid = id) and outgoing (srcid = id) directions. This direction retains duplicates.
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_table | str | Name of the edge collection table (with the id and edges columns). |
direction | EdgeDirection | Specifies the direction of traversal. Supported values are:IN — Neighbors that have edges pointing to the vertex (edges where destid = id).For example, if an edge 5 points to 10, then for id=10, neighbor 5 is included. OUT — Neighbors that the vertex points to (edges where srcid = id). For example, if an edge 5 points to 10, then for id=5, neighbor 10 is included.BOTH — The union of IN and OUT. This traversal includes neighbors from edges pointing to id and edges from id. |
result_indexes | List[str] | Optional. Columns to index. Specify an empty list for none. |
direction to OUT to collect edges from id.
Python
Algorithms
High-level graph algorithms that iterate over the graph structure to produce labels, components, or counts.label_propagation
Executes the Label Propagation Algorithm (LPA) to assign community labels to vertices. Each vertex starts with its own identifier as its label. For a number set by themaxIterations argument, each vertex updates its label to the most frequent label among its neighbors. The algorithm determines ties by choosing the smallest label. The algorithm uses temporary tables for intermediate results and drops these tables when the process completes or if it fails. Isolated vertices retain their initial label. The final table stores id and label columns and can include indexes.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create (columns include id and label). |
max_iterations | int | Maximum number of iterations (must be 1 or greater). |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id column.
Python
connected_components [#connected_components]
Identifies the connected components of an undirected graph. This algorithm configures a Pregel computation in which each vertex initially sets its component label equal to its own identifierid.
In each iteration, vertices send their component label to neighbors. Each vertex updates based on the aggregated minimum value of its current component label and any received values. The process repeats until no more updates occur.
The result table maps each vertex id to its final component label.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create. |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id column.
Python
strongly_connected_components
Computes strongly connected components (SCC) in a directed graph. This function runs a recursive algorithm that partitions vertices into subsets where every vertex is reachable from other vertices in the same subset. This function uses recursive partitioning. The algorithm selects a pivot (typically the minimum identifierid), computes its predecessor set (vertices that can reach the pivot), and its descendant sets (vertices reachable from the pivot). Then, the function identifies the SCC as their intersection, removes that SCC from the graph, and recurses on the remainder until all vertices have been assigned to an SCC. The output contains columns for the id and component identifiers (the minimum id in the SCC).
The function creates temporary tables in the result schema to store intermediate results. This function drops these tables when the computation completes or fails. The final result table contains two columns: id (vertex identifier) and component (the minimum vertex identifier in its SCC subset).
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create (columns include id and component). |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id column.
Python
TriangleCount
TriangleCount identifies all 3-cycles (triangles) in the graph and counts how many distinct triangles each vertex participates in. The algorithm first builds a canonical, undirected edge set by ensuringsrcid < destid and removing duplicates to prevent double-counting. If your input edges are already canonicalized and deduplicated, use TriangleCount.run_pre_canonicalized to skip preprocessing for faster performance.
The function then counts triangles (a, b, c) where a < b < c by intersecting neighbor lists and aggregates per-vertex participation to produce a result table with the id and triangle_count columns.
run syntax
Python
run_pre_canonicalized syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns).For runPreCanonicalized execution, the function assumes this table is already canonicalized (e.g., srcid < destid) and deduplicated. |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create (columns include id and triangle_count). |
result_vertices_indexes | list[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
run
Canonicalize the raw edges internally, count unique triangles, and write per-vertex triangle counts with an index on the id column.
Python
run_pre_canonicalized
Use a pre-canonicalized, deduplicated edge table to count triangles and write per-vertex triangle counts with an index on the id column.
Python
pregel
Provides a generic vertex‑centered iteration framework for custom graph algorithms, similar to the Pregel model. Each iteration updates vertex states by sending messages along edges and then aggregating these messages to compute new states. The algorithm continues iterating until it reaches convergence (no state changes or no messages produced) or a specified iteration cap. The algorithm uses multiple specified SQL expressions. SyntaxPython
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create (columns include id and result). |
initializer_expr | str | A SQL expression to compute the initial state for each vertex (e.g., CASE WHEN type = 'seed' THEN 1.0 ELSE 0.0 END). |
send_to_source_expr | str | Optional. Defines the message sent to the source vertex of an edge, referencing the current vertex states a.state (source) and c.state (destination), and any edge attributes in b.Example: CASE WHEN a.country = c.country THEN 1 ELSE 0 END |
send_to_dest_expr | str | Optional. Defines the message sent to the destination vertex of an edge, referencing the current vertex states a.state (source) and c.state (destination), and any edge attributes as b.<edge_column>. Example: CASE WHEN a.country = c.country THEN 1 ELSE 0 END |
aggregate_expr | str | A SQL aggregation to combine messages per vertex (e.g., SUM(msg)).For a list of supported aggregations, see Aggregate Functions. |
updater_expr | str | A SQL expression to compute the next state from the current state and aggregated messages. For example, this code updates the state to the minimum current state and the aggregated message (similar to the connected_components function): LEAST(a.state, COALESCE(m.aggregated_message, a.state)) AS state |
max_iterations | int | Maximum number of iterations (must be either -1 or a value of 1 or greater). If this value is -1, the Pregel algorithm has no limit, and it runs until convergence. A value of 0 causes the algorithm to return an error. |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
id column.
Python
Paths & Ranking
These functions include the shortest-path and PageRank algorithms.shortest_paths
Computes the shortest distance from every vertex to each set of landmark vertices using an iterative relaxation algorithm. The algorithm resembles Bellman–Ford but simultaneously handles multiple destinations. Each landmark starts at distance0 and all others at positive infinity. On each iteration, the algorithm examines every edge and checks whether traveling through the connected neighbor would yield a shorter route to a landmark. If a shorter route exists, the algorithm updates the distance of the source vertex. The process stops when no distances improve or the algorithm reaches the maximum number of iterations.
After the process finishes, the algorithm writes a result table with the srcid, destid, and distance columns.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the result table. |
result_table | str | Name of the distances result table. This table has the srcid, destid, and distance columns. |
landmarks | List[int] | One or more landmark vertex identifiers. This list must be non-empty and contain no NULLs. |
edge_weight_column | Optional[str] | Optional. The name of an edge weight column. If you do not specify this argument, the default value is 1.0. |
max_iterations | int | Maximum number of relaxation iterations. This value must be 1 or greater. |
result_indexes | List[str] | Optional. The list of columns to index in the result table (srcid, destid, and distance columns). Specify an empty list for none. |
src and dest columns.
Python
static_page_rank
Computes PageRank scores for each vertex over a fixed number of iterations. The algorithm follows the standard PageRank formula with a reset probability (reset_prob) and uses common table expressions to calculate contributions from incoming edges and redistribute ranks from dangling nodes.
The algorithm supports two variants:
- Standard PageRank — All vertices start with rank
1.0/N, whereNis the number of vertices. Specify this variant ifpersonalizationSrcIdisnull. - Personalized PageRank — The specified vertex starts with a rank of
1.0, while others start with a rank of0.0. Specify this variant ifpersonalizationSrcIdis a vertex identifier.
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create (columns include all vertex columns and a pagerank column). |
num_iterations | int | Number of iterations to run. This value must be 1 or greater. |
reset_prob | float | A value between 0.0 and 1.0 that controls the damping factor for how PageRank random surfer moves across vertices. A high value (e.g., 0.85) puts more emphasis on link structure, encouraging the algorithm to move from each vertex to a neighbor. This value means PageRank scores tend to concentrate around well-linked regions.A lower score (e.g., 0.50) allows the algorithm to ignore edges and instead jump to a vertex chosen from a base distribution. For standard PageRank, this base is uniform over all vertices. For personalized PageRank, the base is biased toward the specified vertex. This behavior creates more uniform scoring with less sensitivity to link topology. |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
personalization_src_id | Optional[int] | Determines whether PageRank uses the standard or the personalized variant. For personalized mode, specify a vertex identifier. PageRank scoring starts with this vertex set at 1.0, while all other vertices start at 0.0. For standard mode, specify None. All vertices start with rank 1.0/N, where N is the number of vertices. |
id column. This example uses a high reset probability reset_prob of 0.85 to ensure the ranking concentrates on highly linked regions.
Python
dynamic_page_rank
Computes PageRank scores until convergence based on a specified threshold value (tolerance). Unlike the static_page_rank function, this algorithm runs iterations until the sum of absolute differences between ranks in successive iterations is less than or equal to the tolerance value. The algorithm handles personalization similarly to static_page_rank. At each iteration, the function uses the PageRank formula, collects rank values, and redistributes them.
The algorithm supports two variants:
- Standard PageRank — All vertices start with rank
1.0/N, whereNis the number of vertices. Specify this variant ifpersonalizationSrcIdisnull. - Personalized PageRank — The specified vertex starts with a rank of
1.0, while others start with a rank of0.0. Specify this variant ifpersonalizationSrcIdis a vertex identifier.
tolerance threshold, the function writes a vertices table containing all the original vertex columns with a new PageRank scoring column.
Syntax
Python
| Argument | Data Type | Description |
|---|---|---|
connection | pyocient.Connection | An active database connection using the pyocient module. |
input_schema | str | A non-empty schema containing the input tables. |
input_vertices_table | str | Input vertices table (must have an id column). |
input_edges_table | str | Input edges table (must have the srcid and destid columns). |
result_schema | str | A writable schema to create the result tables. |
result_vertices_table | str | Name of the vertices table to create (columns include all vertex columns and pagerank). |
tolerance | float | The convergence threshold that stops iterations when the rank changes become negligible. After each iteration, the algorithm measures the maximum change in any rank of a vertex. If that change is below the specified tolerance, the algorithm considers the computation converged and stops early. A high tolerance value (e.g., 0.001) is good for quick exploratory runs that generate an approximate ranking. A low tolerance value (e.g., 0.000001) generates a precise ranking, but requires a higher compute cost. |
reset_prob | float | A value between 0.0 and 1.0 that controls the damping factor for how the PageRank random surfer moves across vertices. A high value (e.g., 0.85) puts more emphasis on link structure, encouraging the algorithm to move from each vertex to a neighbor. The high value means PageRank scores tend to concentrate around well-linked regions.A lower score (e.g., 0.50) allows the algorithm to ignore edges and instead jump to a vertex chosen from a base distribution. For standard PageRank, this base is uniform over all vertices. For personalized PageRank, the base is biased toward the specified vertex. This behavior creates more uniform scoring with less sensitivity to link topology. |
result_vertices_indexes | List[str] | Optional. Columns to index in the result vertices table (e.g., id). Specify an empty list for none. |
personalization_src_id | Optional[int] | Determines whether PageRank uses the standard or the personalized variant. For personalized mode, specify a vertex identifier. PageRank scoring starts with this vertex set at 1.0, while all other vertices start at 0.0. For standard mode, specify None. All vertices start with rank 1.0/N, where N is the number of vertices. |
id column. This example uses a low tolerance value of 1.0e-6, which generates high-precision rankings but requires more computing resources.
Python

