OCGraph Python Library

the ocient graph module brings a programming model (similar to {{spark}} graphx) to the {{ocient}} system directly from {{python}} using the pyocient driver the module treats a graph as two relational tables, one for vertices (nodes) and one for edges (directed links) this module provides a composable api for graph transformations, neighborhood analytics, and iterative algorithms (e g , pregel, pagerank) the api validates inputs, avoids destructive changes by materializing results into new tables, supports optional indexing for performance, and follows ocient sql conventions the package installs separately from pyocient and exposes a python native interface that mirrors the {{java}} library for details, see docid\ hoxig9l4m5f2kisrve3ge installation use pyocient for connectivity and ocient graph for graph apis the graph library is a separate package that depends on pyocient for a tutorial about installing and using pyocient , see docid\ jb9ujd1myi pollu9azdu install and import install the ocient graph module pip install ocient graph import the module from pyocient import connect from ocient graph import ( subgraph, collect neighbors, edgedirection, ) data model requirements database tables that use the ocgraph python library must adhere to this structure in addition to the listed requirements, tables can include other columns table description requirements vertices table a table with one row per vertex (node) this table typically represents the anchor for graph algorithms and transforms many methods join edges to vertices by the id column the table must contain the id bigint not null column definition as the unique vertex identifier edges table a table with one row per directed edge (relationship) each row is a directed edge from a source vertex to a destination vertex the table must contain these column definitions srcid bigint not null , destid bigint not null subgraph and filtering use a subgraph or various filters to restrict a graph to relevant vertices and edges these functions create filtered copies or masked intersections, preserving schema and optional indexes for performance subgraph creates filtered vertex and edge tables using vertex and triplet predicates, retaining only edges with endpoints that remain after vertex filtering the function creates the requested indexes and performs best effort cleanup in the event of failure syntax subgraph( connection, input schema, input vertices table, input edges table, result schema, result vertices table, result edges table, vertex filter, edge filter, \[ result vertices indexes \[ , ] ], \[ result edges indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have srcid and destid columns) result schema str a writable schema to create the result tables result vertices table str name of the filtered vertices table to create result edges table str name of the filtered edges table to create vertex filter str sql predicate to filter the vertices (without the where keyword) example status = 'active' and score > 0 edge filter str sql predicate that the system evaluates in a triplet context using the aliases a (source vertex), b (edge), and c (destination vertex) this predicate does not require a where keyword example b amount > 50 and a region = c region result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none result edges indexes list\[str] optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none example create an active customer subgraph that includes only purchases exceeding $50 where the source and destination share a region subgraph( connection, "sales", "customers", "purchases", "sales", "customers active", "purchases active", "status = 'active' and score > 0", "b amount > 50 and a region = c region", \["id","region"], \["srcid","destid"], ) filter vertices creates a filtered subgraph by selecting vertices that match a predicate while retaining only edges with endpoints that are in the filtered vertex set syntax filter vertices( connection, input schema, input vertices table, input edges table, result schema, result vertices table, result edges table, vertex filter, \[ result vertices indexes \[ , ] ], \[ result edges indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have srcid and destid columns) result schema str a writable schema to create the result tables result vertices table str name of the result vertices table the name must not conflict with the names of input tables result edges table str name of the edges table to create the name must not conflict with the names of input tables vertex filter str sql predicate to filter the vertices (without the where keyword) example status = 'active' and score > 0 result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none result edges indexes list\[str] optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none example filter us customers and retain edges with endpoints that remain in the filtered vertex set filter vertices( connection, "sales", "customers", "purchases", "sales", "customers us", "purchases us", "country = 'us'", \["id"], \["srcid","destid"], ) filter edges creates a filtered edges table by selecting edges that match a predicate syntax filter edges( connection, input schema, input edges table, result schema, result edges table, edge filter, \[ result edges indexes \[, ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty achema containing the input edges table input edges table str input edges table (must have srcid and destid columns) result schema str a writable schema to create the result tables result edges table str result edges table name the name must not conflict with the names of input tables edge filter str sql predicate on edges (without the where keyword) example weight > 0 5 and type = 'active' result edges indexes list\[str] optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none example this example demonstrates how to create a filtered edges table from an existing purchases edge set by keeping only edges that meet a business rule ( weight > 0 5 and active ) then, the filteredges method indexes the result on the srcid and destid columns for faster lookups filter edges( connection, "sales", "purchases", "sales", "purchases filtered", "weight > 0 5 and type = 'active'", \["srcid", "destid"], ) mask creates a masked subgraph by intersecting two graphs vertices intersect if the vertex identifier is present in both graphs edges intersect when the srcid and destid values are present in both graphs the function creates a masked subgraph from rows that intersect with each other the function copies rows that intersect from the graph defined by the arguments input vertices table and input edges table , including any attributes you can optionally create indexes on the result subgraph tables syntax mask( connection, input schema, input vertices table, input edges table, other schema, other vertices table, other edges table, result schema, result vertices table, result edges table, \[ result vertices indexes \[ , ] ], \[ result edges indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str the primary vertices table to intersect (must have an id column) the masked subgraph created by this function copies rows that intersect from this table input edges table str the primary edges table to intersect (must have srcid and destid columns) the masked subgraph created by this function copies rows that intersect from this table other schema str schema containing the second graph other vertices table str the second vertices table to intersect other edges table str the second edges table to intersect result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create result edges table str name of the edges table to create result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none result edges indexes list\[str] optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none example create a masked subgraph by intersecting two graphs the example copies vertices and edges that are present in both graphs, along with the remaining endpoints mask( connection, "sales", "customers", "purchases", "ref", "customers ref", "purchases ref", "sales", "customers masked", "purchases masked", \["id"], \["srcid", "destid"], ) transformations construct new vertex or edge tables by computing derived columns, reversing direction, or aggregating duplicates these functions do not change the original inputs instead, the functions materialize new results map vertices creates a new vertices table with the identifier id and computed columns use the result column expressions argument to calculate additional columns this function can also add indexes before inserting data syntax map vertices( connection, input schema, input vertices table, result schema, result vertices table, result column expressions, \[ result vertices indexes \[, ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input vertices table input vertices table str input vertices table (must have an id column) result schema str a schema to create the result vertices table result vertices table str name of the result vertices table the name must not conflict with the names of input tables result column expressions list\[str] one or more sql expressions defining result columns beyond id use the as alias name keyword for stable names result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example create a new vertices table with two new columns, name upper and is vip , and generate indexes for the id and name upper columns map vertices( connection, "sales", "customers", "sales", "customers enriched", \[ "upper(name) as name upper", "case when score > 1000 then true else false end as is vip", ], \["id","name upper"], ) map edges creates a new edges table with srcid , destid , and any additional computed columns expressions should refer to input edge columns by their original names, and each computed expression should include an as alias keyword syntax map edges( connection, input schema, input edges table, result schema, result edges table, \[ result column expressions \[, ] ], \[ result edges indexes \[, ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input edges table input edges table str input edges table (must have srcid and destid columns) result schema str a schema to create the result edges table result edges table str name of the result edges table the name must not conflict with the names of input tables result column expressions list\[str] optional sql expressions for additional edge columns (besides the srcid and destid columns) use the as alias name keyword for stable names result edges indexes list\[str] optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none example create a new edges table with two columns, discounted amount and big txn , and generate indexes for the srcid and destid columns map edges( connection, "sales", "purchases", "sales", "purchases enriched", \[ "amount 0 9 as discounted amount", "case when amount > 100 then 1 else 0 end as big txn", ], \["srcid","destid"], ) map triplets creates a new edges table with computed columns that reference a (source vertex), b (edge), and c (destination vertex) the output automatically includes b srcid and b destid columns syntax map triplets( connection, input schema, input vertices table, input edges table, result schema, result edges table, \[ result column expressions \[, ] ], \[ result edges indexes \[, ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) used as the a and c vertices input edges table str input edges table (must have the srcid and destid columns) this function uses this argument as the b edge result schema str a writable schema to create the result edges table result edges table str name of the result edges table the name must not conflict with the names of input tables result column expressions list\[str] optional sql expressions that can reference a (source vertex), b (edge), and c (destination vertex) use the as alias name keyword for stable names result edges indexes list\[str] optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none example create a new triplet table from the vertices and edges tables with the amount and same country columns, and generate indexes for the src and destid columns map triplets( connection, "sales", "customers", "purchases", "sales", "purchases triplets", \[ "b amount as amount", "case when a country = c country then 1 else 0 end as same country", ], \["srcid","destid"], ) reverse edges creates a new edges table with the srcid and destid columns reversed, preserving other columns use this function to traverse a graph in the opposite direction syntax reverse edges( connection, input schema, input edges table, result schema, result edges table \[ result edges indexes \[, ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input edges table input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the reversed edges table result edges table str name of the reversed edges table to create the name must not conflict with the names of input tables result edges indexes list\[str] optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none example transform edge direction by reversing the srcid and destid columns the example also creates indexes for these columns reverse edges( connection, "sales", "purchases", "sales", "purchases reversed", \["srcid","destid"], ) group edges groups duplicate rows of the srcid and destid columns, producing one row for each unique pair of values in a new edges table this function performs aggregations based on one or more sql expressions syntax group edges( connection, input schema, input edges table, result schema, result edges table, \[ result column expressions \[, ] ], \[ result edges indexes \[, ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input edges table input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the grouped tables result edges table str name of the grouped edges table the name must not conflict with the names of input tables result column expressions list\[str] one or more aggregate expressions for the srcid and destid columns for details about sql aggregations, see docid\ roka1ck6hndmod1smej1s use the as alias name keyword for stable names result edges indexes list\[str] optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none example create a new edge table that includes sql aggregations for counting unique transactions txn count and total sums total amount also, this function generates indexes for the src and destid columns group edges( connection, "sales", "purchases", "sales", "purchases grouped", \[ "count( ) as txn count", "sum(amount) as total amount", ], \["srcid","destid"], ) triplets produce triplet representations that are made of a (source vertex), b (edge), and c (destination vertex), either as a logical view or a materialized table for downstream queries create triplets view creates a view that combines the edge table with the source and destination vertex attributes this view is useful for analyzing relationships without having to repeatedly join tables the view includes these columns all original edge columns (including the srcid and destid columns) all source vertex columns except id source vertex column names have the src prefix all destination vertex columns except id destination vertex column names have the dest prefix use the docid\ hnemo4f1y3sslallr03vp function instead if you want to create a materialized table with indexes instead of a view syntax create triplets view( connection, input schema, input vertices table, input edges table, result schema, result triplets view ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input vertices and edges tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a schema to create the view result triplets view str name of the triplets view to create the name must not conflict with the names of input tables example create a triplets view to inspect edges with joined source and destination vertex attributes create triplets view( connection, "sales", "customers", "purchases", "sales", "triplets v", ) create triplets table creates a materialized table that combines the edge table with the source and destination vertex attributes this table is useful for analyzing relationships without having to repeatedly join tables the created table includes these columns all original edge columns (including the srcid and destid columns) all source vertex columns except id source vertex column names have the src prefix all destination vertex columns except id destination vertex column names have the dest prefix use the docid\ hnemo4f1y3sslallr03vp function if you want to create a view instead of a new table syntax create triplets table( connection, input schema, input vertices table, input edges table, result schema, result triplets table, \[ result triplets indexes \[, ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a schema to create the triplets table result triplets table str name of the triplets table to create the name must not conflict with any input names result triplets indexes list\[str] optional columns to index in the result table specify an empty list for none example create a new table for triplets generate indexes for the src id and dest id columns create triplets table( connection, "sales", "customers", "purchases", "sales", "triplets t", \["srcid","destid"], ) degrees compute degree metrics for each vertex from the edges table these functions produce small vertex tables suitable for joins and analytics in degrees computes how many edges point to each vertex in an edge table by counting how many times each unique destid value appears the result table has two columns id (the destination vertex) and in degree (the count) syntax in degrees( connection, input schema, input edges table, result schema, result vertices table, \[ result vertices indexes \[, ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input table input edges table str input edges table (must have the srcid and destid columns) result schema str a schema to create the in degree table result vertices table str name of the vertices table to create (columns include id and in degree ) result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example compute in degrees per vertex and generate an index on the id column in degrees( connection, "sales", "purchases", "sales", "customers in degree", \["id"], ) out degrees computes how many edges originate from each vertex in an edge table by counting how many times each unique srcid value appears the result table has two columns id (the source vertex) and out degree (the count) syntax out degrees( connection, input schema, input edges table, result schema, result vertices table, \[ result vertices indexes \[, ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input table input edges table str input edges table (must have the srcid and destid columns) result schema str a schema to create the out degree table result vertices table str name of the vertices table to create (columns include id and out degree ) result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example compute the out degrees count for each vertex and generate an index on the id column out degrees( connection, "sales", "purchases", "sales", "customers out degree", \["id"], ) degrees computes the total degrees (in degrees and out degrees) for each vertex in an edge table by counting how many times each unique srcid and destid value appears the result table has two columns id (the destination or source vertex) and degree (the count) syntax degrees( connection, input schema, input edges table, result schema, result vertices table, \[ result vertices indexes \[, ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input table input edges table str input edges table (must have the srcid and destid columns) result schema str a schema to create the degree table result vertices table str name of the vertices table to create (columns include id and out degree ) result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example compute total degrees for each vertex and generate an index on the id column degrees( connection, "sales", "purchases", "sales", "customers degree", \["id"], ) vertex extraction and joins build vertex sets from edges and combine vertex attributes across tables these functions are useful for shaping vertex properties and consolidating features from edges builds a vertices table from an edges table by extracting the unique source and destination identifiers this function can optionally compute additional columns using sql expressions by referencing the unique identifier as ids id the created table always contains the id column with one additional column per expression syntax from edges( connection, input schema, input edges table, result schema, result vertices table, \[ result column expressions \[ , ] ], \[ result vertices indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input table input edges table str input edges table (must have the srcid and destid columns) result schema str a schema to create the result vertices table result vertices table str name of the result vertices table (always includes id ) result column expressions list\[str] optional specify a list of sql expressions that reference ids id to add additional columns result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example create a vertices table from edge endpoints and add a bucket column that assigns each vertex to one of 10 buckets generate an index for the id and bucket columns from edges( connection, "sales", "purchases", "sales", "customers from edges", \["ids id % 10 as bucket"], \["id","bucket"], ) join vertices merges two vertices tables by retaining every row from a primary table ( input vertices table ) and selectively updating rows that also appear in the modification table ( modification vertices table ) the merged table includes all vertices from the primary table that do not appear in the modification table for vertices that appear in both tables, the function must include a list of expressions ( resultattributeexpressions ) in the same column order for every non identifier column in the merged result table these sql expressions can add computations to columns, or simply add aliases if no changes are needed each expression can reference columns from the primary table (using alias a ) or from the modification table (using alias b ) syntax join vertices( connection, input schema, input vertices table, modification schema, modification vertices table, result schema, result vertices table, result attribute expressions \[ , ], \[ result vertices indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the primary input table input vertices table str input vertices table (with the alias a ) this table must have an id column modification schema str a schema for the modification vertices table modification vertices table str modification vertices table (with the alias b ) this table must have an id column result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create result attribute expressions list\[str] a list of sql expressions that define the non identifier columns of the joined result each expression can reference the left ( input vertices table ) vertex as a and the right ( modification vertices table ) vertex as b you must end every expression with an explicit alias using as alias name so the output column names are stable result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example merge vertex attributes and generate indexes for the id and status columns this example includes two sql expressions to update the status and score columns based on the modification vertex table using the coalesce sql reference function join vertices( connection, "sales", "customers", "sales", "customers updates", "sales", "customers merged", \[ "coalesce(b new status, a status) as status", "coalesce(b score delta + a score, a score) as score", ], \["id","status"], ) inner join vertices performs an inner join on two vertex tables using an equality comparison a id = b id the result table automatically includes the id column from the first table the function must include a list of sql expressions ( result attribute expressions ) in the same column order for every non identifier column in the merged result table these sql expressions can add computations to columns, or simply add aliases if no changes are needed each expression can reference columns from the primary table using the alias a or from the modification table using the alias b syntax inner join vertices( connection, input schema, input vertices table, other schema, other vertices table, result schema, result vertices table, result attribute expressions \[ , ], \[ result vertices indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a schema for the left vertices table input vertices table str input vertices table for the left side of the join (must have an id column) other schema str a schema for the right vertices table other vertices table str the other vertices table for the right side of the join (must have an id column) result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create result attribute expressions list\[str] a list of sql expressions that define the non identifier columns of the joined result each expression can reference the left ( input vertices table ) vertex as a and the right ( other vertices table ) vertex as b you must end every expression with an explicit alias using as alias name to ensure the output column names are stable result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example create an inner join between two vertex tables and generate an index on the id column inner join vertices( connection, "sales", "customers", "sales", "profiles", "sales", "customers joined", \[ "a id as id", "a status as status", "b tier as tier", ], \["id"], ) outer join vertices performs a left outer join between two vertices tables using an equality comparison a id = b id the result table includes all rows from the left table for left table rows that have no match in the right table, any expression that reads columns from the right table with the alias b evaluates to null (while expressions that only read the table with the alias a remain non null as usual) the method must include a list of sql expressions ( result attribute expressions ) in the same column order for every non identifier column in the merged result table these sql expressions can add computations to columns, or simply add aliases if no changes are needed each expression can reference columns from the primary table using the alias a or from the modification table using the alias b syntax outer join vertices( connection, input schema, input vertices table, other schema, other vertices table, result schema, result vertices table, result attribute expressions \[ , ], \[ result vertices indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a schema for the left vertices table input vertices table str left vertices table (must have an id column) other schema str a schema for the right vertices table (with the alias b ) other vertices table str right vertices table (must have an id column) result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create result attribute expressions list\[str] a list of sql expressions that define the non identifier columns of the joined result each expression can reference the left ( input vertices table ) vertex as a and the right ( other vertices table ) vertex as b you must end every expression with an explicit alias using as alias name to ensure the output column names are stable result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example perform a left outer join on two vertices tables and generate an index on the id column outer join vertices( connection, "sales", "customers", "sales", "profiles", "sales", "customers joined", \[ "a id as id", "a status as status", "b tier as tier", ], \["id"], ) collect neighbors for each vertex in a table, this function collects information on neighbors (identifier and any attributes) as an array of tuples for a specified direction ( in , out , or both ), the function aggregates tuples representing each neighboring vertex into an array the direction types are in — neighbors with edges pointing to the vertex (edges where destid = id ) out — neighbors that the vertex points to (edges where srcid = id ) both — union of in and out with neighbors from incoming ( destid = id ) and outgoing ( srcid = id ) edges the result table has the columns id (the vertex identifier) and neighbors (an array of tuples representing each neighbor) if an error occurs after table creation, the function drops the result table syntax collect neighbors( connection, input schema, input vertices table, input edges table, result schema, result table, direction, \[ result indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the result tables result table str name of the neighbors collection table (with the id and neighbors columns) direction edgedirection specifies the direction of traversal supported values are in — neighbors that have edges pointing to the vertex (edges where destid = id ) for example, if an edge 5 points to 10 , then for id=10 , neighbor 5 is included out — neighbors that the vertex points to (edges where srcid = id ) for example if an edge 5 points to 10 , then for id=5 , neighbor 10 is included both — the union of in and out this traversal includes neighbors from edges pointing to id and edges from id result indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example collect incoming neighbors for each vertex and generate an index on the id column the direction argument set to in collects neighbors pointing to id collect neighbors( connection, "sales", "customers", "purchases", "sales", "neighbors in", edgedirection in, \["id"], ) collect edges for each vertex in a table, this function collects an array of adjacent edge rows based on the specified direction the result table has two columns id (the vertex identifier) and edges (an array of tuples, each tuple containing all columns from the edges table for a connected edge) the direction types are in — edges pointing to the vertex (edges where destid = id ) out — edges originating from the vertex (edges where srcid = id ) both — union of in and out that includes edges from incoming ( destid = id ) and outgoing ( srcid = id ) directions this direction retains duplicates if an error occurs after table creation, the function drops the result table syntax collect edges( connection, input schema, input vertices table, input edges table, result schema, result table, direction, \[ result indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the result tables result table str name of the edge collection table (with the id and edges columns) direction edgedirection specifies the direction of traversal supported values are in — neighbors that have edges pointing to the vertex (edges where destid = id ) for example, if an edge 5 points to 10 , then for id=10 , neighbor 5 is included out — neighbors that the vertex points to (edges where srcid = id ) for example, if an edge 5 points to 10 , then for id=5 , neighbor 10 is included both — the union of in and out this traversal includes neighbors from edges pointing to id and edges from id result indexes list\[str] optional columns to index specify an empty list for none example collect outgoing edges per vertex the example sets the direction to out to collect edges from id collect edges( connection, "sales", "customers", "purchases", "sales", "outgoing edges", edgedirection out, \["id"], ) algorithms high level graph algorithms that iterate over the graph structure to produce labels, components, or counts label propagation executes the https //en wikipedia org/wiki/label propagation algorithm (lpa) to assign community labels to vertices each vertex starts with its own identifier as its label for a number set by the maxiterations argument, each vertex updates its label to the most frequent label among its neighbors the algorithm determines ties by choosing the smallest label the algorithm uses temporary tables for intermediate results and drops these tables when the process completes or if it fails isolated vertices retain their initial label the final table stores id and label columns and can include indexes syntax label propagation( connection, input schema, input vertices table, input edges table, result schema, result vertices table, max iterations, result vertices indexes, \[ result vertices indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create (columns include id and label ) max iterations int maximum number of iterations (must be 1 or greater) result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example run label propagation for 10 iterations and assign labels to vertices generate an index on the id column label propagation( connection, "sales", "customers", "purchases", "sales", "lpa labels", 10, \["id"], ) connected components identifies the connected components of an undirected graph this algorithm configures a pregel computation in which each vertex initially sets its component label equal to its own identifier id in each iteration, vertices send their component label to neighbors each vertex updates based on the aggregated minimum value of its current component label and any received values the process repeats until no more updates occur or when it reaches the maximum specified number of iterations ( max iterations ) the result table maps each vertex id to its final component label syntax connected components( connection, input schema, input vertices table, input edges table, result schema, result vertices table, max iterations, \[ result vertices indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create max iterations int maximum number of iterations (must be 1 or greater) result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example compute connected components for a maximum of 20 iterations and generate an index on the id column connected components( connection, "sales", "customers", "purchases", "sales", "components", 20, \["id"], ) strongly connected components computes strongly connected components (scc) in a directed graph this function runs a recursive algorithm that partitions vertices into subsets where every vertex is reachable from other vertices in the same subset this function uses recursive partitioning the algorithm selects a pivot (typically the minimum identifier id ), computes its predecessor set (vertices that can reach the pivot), and its descendant sets (vertices reachable from the pivot) then, the function identifies the scc as their intersection, removes that scc from the graph, and recurses on the remainder until all vertices have been assigned to an scc the output contains columns for the id and component identifiers (the minimum id in the scc) the function creates temporary tables in the result schema to store intermediate results this function drops these tables when the computation completes or fails the final result table contains two columns id (vertex identifier) and component (the minimum vertex identifier in its scc subset) syntax strongly connected components( connection, input schema, input vertices table, input edges table, result schema, result vertices table, \[ result vertices indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create (columns include id and component ) result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example compute the scc and generate an index on the id column strongly connected components( connection, "sales", "customers", "purchases", "sales", "scc", \["id"], ) trianglecount trianglecount identifies all 3 cycles (triangles) in the graph and counts how many distinct triangles each vertex participates in the algorithm first builds a canonical, undirected edge set by ensuring srcid < destid and removing duplicates to prevent double counting if your input edges are already canonicalized and deduplicated, use trianglecount run pre canonicalized to skip preprocessing for faster performance the function then counts triangles ( a , b , c ) where a < b < c by intersecting neighbor lists and aggregates per vertex participation to produce a result table with the id and triangle count columns run syntax trianglecount run( input schema, input vertices table, input edges table, result schema, result vertices table, \[ result vertices indexes \[ , ] ], ) run pre canonicalized syntax trianglecount run pre canonicalized( connection, input schema, input vertices table, canonical edges schema, canonical edges table, result schema, result vertices table, \[ result vertices indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) for runprecanonicalized execution, the function assumes this table is already canonicalized (e g , srcid < destid ) and deduplicated result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create (columns include id and triangle count ) result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none examples count triangles using run canonicalize the raw edges internally, count unique triangles, and write per vertex triangle counts with an index on the id column trianglecount run( connection, "sales", "customers", "purchases", "sales", "triangle counts", \["id"], ) count triangles using run pre canonicalized use a pre canonicalized, deduplicated edge table to count triangles and write per vertex triangle counts with an index on the id column trianglecount run pre canonicalized( connection, "sales", "customers", "sales", "purchases canonical", "sales", "triangle counts", \["id"], ) pregel provides a generic vertex‑centered iteration framework for custom graph algorithms, similar to the https //research google/pubs/pregel a system for large scale graph processing/ each iteration updates vertex states by sending messages along edges and then aggregating these messages to compute new states the algorithm continues iterating until it reaches convergence (no state changes or no messages produced) or a specified iteration cap the algorithm uses multiple specified sql expressions syntax pregel( connection, input schema, input vertices table, input edges table, result schema, result vertices table, initializer expr, send to source expr, send to dest expr, aggregate expr, updater expr, max iterations, \[ result vertices indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create (columns include id and result ) initializer expr str a sql expression to compute the initial state for each vertex (e g , case when type = 'seed' then 1 0 else 0 0 end ) send to source expr str optional defines the message sent to the source vertex of an edge, referencing the current vertex states a state (source) and c state (destination), and any edge attributes in b example case when a country = c country then 1 else 0 end send to dest expr str optional defines the message sent to the destination vertex of an edge, referencing the current vertex states a state (source) and c state (destination), and any edge attributes as b \<edge column> example case when a country = c country then 1 else 0 end aggregate expr str a sql aggregation to combine messages per vertex (e g , sum(msg) ) for a list of supported aggregations, see docid\ roka1ck6hndmod1smej1s updater expr str a sql expression to compute the next state from the current state and aggregated messages for example, this code updates the state to the minimum current state and the aggregated message (similar to the docid\ hnemo4f1y3sslallr03vp function) least(a state, coalesce(m aggregated message, a state)) as state max iterations int maximum number of iterations (must be 1 or greater) result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none example run a simple pregel computation summing incoming edge amounts into the vertex state for 10 iterations at most, and generate an index on the id column pregel( connection, "sales", "customers", "purchases", "sales", "pregel result", "0 as state", "b amount", none, "sum(msg) as aggregated message", "state + coalesce(aggregated message, 0) as state", 10, \["id"], ) paths & ranking these functions include the shortest path and pagerank algorithms shortest paths computes the shortest distance from every vertex to each set of landmark vertices using an iterative relaxation algorithm the algorithm resembles https //en wikipedia org/wiki/bellman%e2%80%93ford algorithm but simultaneously handles multiple destinations each landmark starts at distance 0 and all others at positive infinity on each iteration, the algorithm examines every edge and checks whether traveling through the connected neighbor would yield a shorter route to a landmark if a shorter route exists, the algorithm updates the distance of the source vertex the process stops when no distances improve or the algorithm reaches the maximum number of iterations after the process finishes, the algorithm writes a result table with the srcid , destid , and distance columns syntax shortest paths( connection, input schema, input vertices table, input edges table, result schema, result table, landmarks\[ , ], edge weight column, max iterations, \[ result vertices indexes \[ , ] ] ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the result table result table str name of the distances result table this table has the srcid , destid , and distance columns landmarks list\[int] one or more landmark vertex identifiers this list must be non empty and contain no nulls edge weight column optional\[str] optional the name of an edge weight column if you do not specify this argument, the default value is 1 0 max iterations int maximum number of relaxation iterations this value must be 1 or greater result indexes list\[str] optional the list of columns to index in the result table ( srcid , destid , and distance columns) specify an empty list for none example compute distances from landmarks and generate indexes on the src and dest columns shortest paths( connection, "sales", "customers", "purchases", "sales", "distances", \[1, 42], none, 10, \["srcid","destid"], ) static page rank computes https //en wikipedia org/wiki/pagerank scores for each vertex over a fixed number of iterations the algorithm follows the standard pagerank formula with a reset probability ( reset prob ) and uses common table expressions to calculate contributions from incoming edges and redistribute ranks from dangling nodes the algorithm supports two variants standard pagerank — all vertices start with rank 1 0/n , where n is the number of vertices specify this variant if personalizationsrcid is null personalized pagerank — the specified vertex starts with a rank of 1 0 , while others start with a rank of 0 0 specify this variant if personalizationsrcid is a vertex identifier after running pagerank for a fixed number of iterations, the function writes a result vertices table containing all original vertex columns with a new pagerank scoring column syntax static page rank( connection, input schema, input vertices table, input edges table, result schema, result vertices table, num iterations, reset prob, \[ result vertices indexes \[ , ] ], personalization src id, ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create (columns include all vertex columns and a pagerank column) num iterations int number of iterations to run this value must be 1 or greater reset prob float a value between 0 0 and 1 0 that controls the damping factor for how pagerank random surfer moves across vertices a high value (e g , 0 85 ) puts more emphasis on link structure, encouraging the algorithm to move from each vertex to a neighbor this value means pagerank scores tend to concentrate around well linked regions a lower score (e g , 0 50 ) allows the algorithm to ignore edges and instead jump to a vertex chosen from a base distribution for standard pagerank, this base is uniform over all vertices for personalized pagerank, the base is biased toward the specified vertex this behavior creates more uniform scoring with less sensitivity to link topology result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none personalization src id optional\[int] determines whether pagerank uses the standard or the personalized variant for personalized mode, specify a vertex identifier pagerank scoring starts with this vertex set at 1 0 , while all other vertices start at 0 0 for standard mode, specify none all vertices start with rank 1 0/n , where n is the number of vertices example run fixed iteration pagerank and generate an index on the id column this example uses a high reset probability reset prob of 0 85 to ensure the ranking concentrates on highly linked regions static page rank( connection, "sales", "customers", "purchases", "sales", "pagerank static", 10, 0 85, \["id"], none, ) dynamic page rank computes pagerank scores until convergence based on a specified threshold value ( tolerance ) unlike the docid\ hnemo4f1y3sslallr03vp function, this algorithm runs iterations until the sum of absolute differences between ranks in successive iterations is less than or equal to the tolerance value the algorithm handles personalization similarly to static page rank at each iteration, the function uses the pagerank formula, collects rank values, and redistributes them the algorithm supports two variants standard pagerank — all vertices start with rank 1 0/n , where n is the number of vertices specify this variant if personalizationsrcid is null personalized pagerank — the specified vertex starts with a rank of 1 0 , while others start with a rank of 0 0 specify this variant if personalizationsrcid is a vertex identifier after running pagerank until the system reaches the tolerance threshold, the function writes a vertices table containing all the original vertex columns with a new pagerank scoring column syntax dynamic page rank( connection, input schema, input vertices table, input edges table, result schema, result vertices table, tolerance, reset prob, \[ result vertices indexes \[ , ] ], personalization src id, ) argument data type description connection pyocient connection an active database connection using the pyocient module input schema str a non empty schema containing the input tables input vertices table str input vertices table (must have an id column) input edges table str input edges table (must have the srcid and destid columns) result schema str a writable schema to create the result tables result vertices table str name of the vertices table to create (columns include all vertex columns and pagerank ) tolerance float the convergence threshold that stops iterations when the rank changes become negligible after each iteration, the algorithm measures the maximum change in any rank of a vertex if that change is below the specified tolerance, the algorithm considers the computation converged and stops early a high tolerance value (e g , 0 001 ) is good for quick exploratory runs that generate an approximate ranking a low tolerance value (e g , 0 000001 ) generates a precise ranking, but requires a higher compute cost reset prob float a value between 0 0 and 1 0 that controls the damping factor for how the pagerank random surfer moves across vertices a high value (e g , 0 85 ) puts more emphasis on link structure, encouraging the algorithm to move from each vertex to a neighbor the high value means pagerank scores tend to concentrate around well linked regions a lower score (e g , 0 50 ) allows the algorithm to ignore edges and instead jump to a vertex chosen from a base distribution for standard pagerank, this base is uniform over all vertices for personalized pagerank, the base is biased toward the specified vertex this behavior creates more uniform scoring with less sensitivity to link topology result vertices indexes list\[str] optional columns to index in the result vertices table (e g , id ) specify an empty list for none personalization src id optional\[int] determines whether pagerank uses the standard or the personalized variant for personalized mode, specify a vertex identifier pagerank scoring starts with this vertex set at 1 0 , while all other vertices start at 0 0 for standard mode, specify none all vertices start with rank 1 0/n , where n is the number of vertices example run dynamic pagerank to convergence and generate an index on the id column this example uses a low tolerance value of 1 0e 6 , which generates high precision rankings but requires more computing resources dynamic page rank( connection, "sales", "customers", "purchases", "sales", "pagerank dynamic", 1 0e 6, 0 85, \["id"], none, ) bibliography pregel a system for large scale graph processing ” accessed november 18, 2025 https //research google/pubs/pregel a system for large scale graph processing/ related links docid\ hoxig9l4m5f2kisrve3ge docid\ jb9ujd1myi pollu9azdu