OCGraph Java Library

the ocgraph {{java}} class in the {{ocient}} system implements a set of graph operations (similar to {{spark}} graphx) directly on tables using jdbc sql the overall structure of a graph comprises at least two database tables one for vertices (nodes) and one for edges (directed links) the ocgraph library runs graph algorithms and transformations by executing sql over these tables getting started to use the ocgraph library of methods, you must have access to an installed ocient system and the jdbc driver for details, see docid apnndn tjqmjdd5oqdvd use jdbc for connectivity and the ocgraph class for graph apis the java library depends on a valid java sql statement class to connect to the ocient system the class builds and executes sql using that statement and does not manage connections itself import the ocgraph class specify the com ocient jdbc graph ocgraph class in the import declaration import java sql connection; import java sql drivermanager; import java sql statement; import com ocient jdbc graph ocgraph; data model requirements database tables that use the ocgraph library must adhere to a graphx table structure in addition to the listed requirements, tables can include other columns table description requirements vertices table a table with one row per vertex (node) this table typically represents the anchor for graph algorithms and transforms many methods join edges to vertices by the id column the table must contain the id bigint not null column definition as the unique vertex identifier edges table a table with one row per directed edge (relationship) each row is a directed edge from a source vertex to a destination vertex the table must contain these column definitions srcid bigint not null , destid bigint not null subgraph and filtering use a subgraph or various filters to restrict a graph to relevant vertices and edges these methods create filtered copies or masked intersections, preserving schema and optional indexes for performance subgraph creates filtered vertex and edge tables using vertex and triplet predicates, retaining only edges with endpoints that remain after vertex filtering the method creates the requested indexes and performs best effort cleanup in the event of failure syntax subgraph( inputschema, inputverticestable, inputedgestable, resultschema, resultverticestable, resultedgestable, vertexfilter, edgefilter, \[ resultverticesindexes \[ , ] ], \[ resultedgesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have srcid and destid columns) resultschema string a writable schema to create the result tables resultverticestable string name of the filtered vertices table to create resultedgestable string name of the filtered edges table to create vertexfilter string sql predicate to filter the vertices (without the where keyword) example status = 'active' and score > 0 edgefilter string sql predicate that the system evaluates in a triplet context using the aliases a (source vertex), b (edge), and c (destination vertex) example b weight > 0 5 and a country = c country resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none resultedgesindexes arraylist optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none stmt java sql statement jdbc statement for executing sql example create an active customer subgraph that includes only purchases exceeding $50 where the source and destination share a region ocgraph subgraph( "sales", "customers", "purchases", "sales", "customers active", "purchases active", "status = 'active' and score > 0", "b amount > 50 and a region = c region", new arraylist<>(list of("id","region")), new arraylist<>(list of("srcid","destid")), stmt ); filtervertices creates a filtered subgraph by selecting vertices that match a predicate while retaining only edges with endpoints that are in the filtered vertex set syntax filtervertices( inputschema, inputverticestable, inputedgestable, resultschema, resultverticestable, resultedgestable, vertexfilter, \[ resultverticesindexes \[ , ] ], \[ resultedgesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have srcid and destid columns) resultschema string a writable schema to create the result tables resultverticestable string name of the result vertices table the name must not conflict with the names of input tables resultedgestable string name of the edges table to create vertexfilter string sql predicate for vertices (without the where keyword) example status = 'active' resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none resultedgesindexes arraylist optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example filter us customers and retain edges with endpoints that remain in the filtered vertex set ocgraph filtervertices( "sales", "customers", "purchases", "sales", "customers us", "purchases us", "country = 'us'", new arraylist<>(list of("id")), new arraylist<>(list of("srcid","destid")), stmt ); filteredges creates a filtered edges table by selecting edges that match a predicate syntax filteredges( inputschema, inputedgestable, resultschema, resultedgestable, edgefilter, \[ resultedgesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputedgestable string input edges table (must have srcid and destid columns) resultschema string a writable schema to create the result tables resultedgestable string name of the edges table to create edgefilter string sql predicate on edges (without the where keyword) example weight > 0 5 and type = 'active' resultedgesindexes arraylist optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and database connection example this example demonstrates how to create a filtered edges table from an existing purchases edge set by keeping only edges that meet a business rule ( weight > 0 5 and active ) then, the filteredges method indexes the result on the srcid and destid columns for faster lookups ocgraph filteredges( "sales", "purchases", "sales", "purchases filtered", "weight > 0 5 and type = 'active'", new arraylist<>(list of("srcid","destid")), stmt ); mask creates a masked subgraph by intersecting two graphs vertices intersect if the vertex identifier is present in both graphs edges intersect when the srcid and destid values are present in both graphs the method copies rows that intersect from the graph defined by the arguments inputverticestable and inputedgestable , including any attributes you can optionally create indexes on the result subgraph tables syntax mask( inputschema, inputverticestable, inputedgestable, otherschema, otherverticestable, otheredgestable, resultschema, resultverticestable, resultedgestable, \[ resultverticesindexes \[ , ] ], \[ resultedgesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the primary input tables inputverticestable string the primary vertices table to intersect (must have an id column) the masked subgraph created by this method copies rows that intersect from this table inputedgestable string the primary edges table to intersect (must have srcid and destid columns) the masked subgraph created by this method copies rows that intersect from this table otherschema string schema containing the second graph otherverticestable string the second vertices table to intersect otheredgestable string the second edges table to intersect resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create resultedgestable string name of the edges table to create resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none resultedgesindexes arraylist optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example create a masked subgraph by intersecting two graphs the example copies vertices and edges that are present in both graphs, along with the remaining endpoints ocgraph mask( "sales", "customers", "purchases", "ref", "customers ref", "purchases ref", "sales", "customers masked", "purchases masked", new arraylist<>(list of("id")), new arraylist<>(list of("srcid","destid")), stmt ); transformations construct new vertex or edge tables by computing derived columns, reversing direction, or aggregating duplicates these methods do not change the original inputs instead, the methods materialize new results mapvertices creates a new vertices table with the identifier id and computed columns use the resultcolumnexpressions argument to calculate additional columns this method can also add indexes before inserting data syntax mapvertices( inputschema, inputverticestable, resultschema, resultverticestable, resultcolumnexpressions \[ , ], \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create resultcolumnexpressions arraylist one or more sql expressions defining result columns beyond id use the as alias name keyword for stable names resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example create vertices by adding derived columns, name upper and is vip , and generate indexes for the id and name upper columns ocgraph mapvertices( "sales", "customers", "sales", "customers enriched", new arraylist<>(list of( "upper(name) as name upper", "case when score > 1000 then true else false end as is vip" )), new arraylist<>(list of("id","name upper")), stmt ); mapedges creates a new edges table with srcid , destid , and computed columns expressions should refer to input edge columns by their original names, and each computed expression should include an as alias keyword syntax mapedges( inputschema, inputedgestable, resultschema, resultedgestable, \[ resultcolumnexpressions \[ , ] ], \[ resultedgesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputedgestable string input edges table (must have srcid and destid columns) resultschema string a writable schema to create the result tables resultedgestable string name of the edges table to create resultcolumnexpressions arraylist optional sql expressions for additional edge columns (besides the srcid and destid columns) use the as alias name keyword for stable names resultedgesindexes arraylist optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example create a new edges table with two columns, discounted amount and big txn , and generate indexes for the srcid and destid columns ocgraph mapedges( "sales", "purchases", "sales", "purchases enriched", new arraylist<>(list of( "amount 0 9 as discounted amount", "case when amount > 100 then 1 else 0 end as big txn" )), new arraylist<>(list of("srcid","destid")), stmt ); maptriplets creates a new edges table with computed columns that reference a triplet made of a (source vertex), b (edge), and c (destination vertex) the output automatically includes b srcid and b destid columns syntax maptriplets( inputschema, inputverticestable, inputedgestable, resultschema, resultedgestable, \[ resultcolumnexpressions \[ , ] ], \[ resultedgesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) used as the a and c vertices inputedgestable string input edges table (must have the srcid and destid columns) this method uses this argument as the b edge resultschema string a writable schema to create the result edges table resultedgestable string name of the edges table to create resultcolumnexpressions arraylist optional sql expressions that can reference a , b , or c use the as alias name keyword for stable names resultedgesindexes arraylist optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example create a new triplet table from the vertices and edges tables with the amount and same country columns, and generate indexes for the src and destid columns ocgraph maptriplets( "sales", "customers", "purchases", "sales", "purchases triplets", new arraylist<>(list of( "b amount as amount", "case when a country = c country then 1 else 0 end as same country" )), new arraylist<>(list of("srcid","destid")), stmt ); reverseedges creates a new edges table with the srcid and destid columns reversed, preserving other columns use this method to traverse a graph in the opposite direction syntax reverseedges( inputschema, inputedgestable, resultschema, resultedgestable, \[ resultedgesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the result tables resultedgestable string name of the reversed edges table to create resultedgesindexes arraylist optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example transform edge direction by reversing the srcid and destid columns the example also creates indexes for these columns ocgraph reverseedges( "sales", "purchases", "sales", "purchases reversed", new arraylist<>(list of("srcid","destid")), stmt ); groupedges groups duplicate rows of the srcid and destid columns, producing one row for each unique pair of values in a new edges table this method performs aggregations based on one or more sql expressions syntax groupedges( inputschema, inputedgestable, resultschema, resultedgestable, resultcolumnexpressions \[ , ], \[ resultedgesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the grouped tables resultedgestable string name of the grouped edges table to create resultcolumnexpressions arraylist one or more aggregate expressions for the srcid and destid columns for details about sql aggregations, see docid\ roka1ck6hndmod1smej1s use the as alias name keyword for stable names resultedgesindexes arraylist optional columns to index in the result edges table (e g , srcid and destid columns) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example create a new edge table that includes sql aggregations for counting unique transactions txn count and total sums total amount also, this method generates indexes for the src and destid columns ocgraph groupedges( "sales", "purchases", "sales", "purchases grouped", new arraylist<>(list of( "count( ) as txn count", "sum(amount) as total amount" )), new arraylist<>(list of("srcid","destid")), stmt ); triplets produce triplet representations that are made of a (source vertex), b (edge), and c (destination vertex), either as a logical view or a materialized table for downstream queries createtripletsview creates a view that combines the edge table with the source and destination vertex attributes this view is useful for analyzing relationships without having to repeatedly join tables the view includes these columns all original edge columns (including the srcid and destid columns) all source vertex columns except id source vertex column names have the src prefix all destination vertex columns except id destination vertex column names have the dest prefix use the docid\ hoxig9l4m5f2kisrve3ge method if you want to create a materialized table with indexes instead of a view syntax createtripletsview( inputschema, inputverticestable, inputedgestable, resultschema, resulttripletsview, stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a schema to create the view resulttripletsview string name of the triplets view to create the name must not conflict with the names of input tables stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example create a triplets view to inspect edges with joined source and destination vertex attributes ocgraph createtripletsview( "sales", "customers", "purchases", "sales", "triplets v", stmt ); createtripletstable creates a materialized table that combines the edge table with the source and destination vertex attributes this table is useful for analyzing relationships without having to repeatedly join tables the created table includes these columns all original edge columns (including the srcid and destid columns) all source vertex columns except id source vertex column names have the src prefix all destination vertex columns except id destination vertex column names have the dest prefix use the docid\ hoxig9l4m5f2kisrve3ge method if you want to create a view instead of a new table syntax createtripletstable( inputschema, inputverticestable, inputedgestable, resultschema, resulttripletstable, \[ resulttripletsindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a schema to create the triplets table resulttripletstable string name of the triplets table to create the name must not conflict with the names of input tables resulttripletsindexes arraylist optional columns to index in the result table specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example create a new table for triplets generate indexes for the src id and dest id columns ocgraph createtripletstable( "sales", "customers", "purchases", "sales", "triplets t", new arraylist<>(list of("src id","dest id")), stmt ); degrees compute degree metrics for each vertex from the edges table these methods produce small vertex tables suitable for joins and analytics indegrees computes how many edges point to each vertex in an edge table by counting how many times each unique destid value appears the result table has two columns id (the destination vertex) and in degree (the count) syntax indegrees( inputschema, inputedgestable, resultschema, resultverticestable, \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input table inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a schema to create the in degree table resultverticestable string name of the vertices table to create (columns include id and in degree ) resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example compute in degrees per vertex and generate an index on the id column ocgraph indegrees( "sales", "purchases", "sales", "customers in degree", new arraylist<>(list of("id")), stmt ); outdegrees computes how many edges originate from each vertex in an edge table by counting how many times each unique srcid value appears the result table has two columns id (the source vertex) and out degree (the count) syntax outdegrees( inputschema, inputedgestable, resultschema, resultverticestable, \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input table inputedgestable string input edges table (must have the srcid and destid columns) resultschema string schema to create the out degree table resultverticestable string name of the vertices table to create (columns include id and out degree ) resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example compute the out degree count for each vertex and generate an index on the id column ocgraph outdegrees( "sales", "purchases", "sales", "customers out degree", new arraylist<>(list of("id")), stmt ); degrees computes the total degrees (in degrees and out degrees) for each vertex in an edge table by counting how many times each unique srcid and destid value appears the result table has two columns id (the destination or source vertex) and degree (the count) syntax degrees( inputschema, inputedgestable, resultschema, resultverticestable, \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input table inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a schema to create the degree table resultverticestable string name of the vertices table to create (columns include id and degree ) resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example compute total degrees for each vertex and generate an index on the id column ocgraph degrees( "sales", "purchases", "sales", "customers degree", new arraylist<>(list of("id")), stmt ); vertex extraction and joins build vertex sets from edges and combine vertex attributes across tables these methods are useful for shaping vertex properties and consolidating features fromedges builds a vertices table from an edges table by extracting the unique source and destination identifiers this method can optionally compute additional columns using sql expressions by referencing the unique identifier as ids id the created table always contains the id column with one additional column per expression syntax fromedges( inputschema, inputedgestable, resultschema, resultverticestable, \[ resultcolumnexpressions \[ , ] ], \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input table inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a schema to create the result vertices table resultverticestable string name of the result vertices table (always includes id ) resultcolumnexpressions arraylist optional specify a list of sql expressions that reference ids id to add additional columns resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example create a vertices table from edge endpoints and add a bucket column that assigns each vertex to one of 10 buckets generate an index for the id and bucket columns ocgraph fromedges( "sales", "purchases", "sales", "customers from edges", new arraylist<>(list of( "ids id % 10 as bucket" )), new arraylist<>(list of("id","bucket")), stmt ); joinvertices merges two vertices tables by retaining every row from a primary table ( inputverticestable ) and selectively updating rows that also appear in the modification table ( modificationverticestable ) the merged table includes all vertices from the primary table that do not appear in the modification table for vertices that appear in both tables, the method must include a list of expressions ( resultattributeexpressions ) in the same column order for every non identifier column in the merged result table these sql expressions can add computations to columns or simply add aliases if no changes are needed each expression can reference columns from the primary table (using alias a ) or from the modification table (using alias b ) syntax joinvertices( inputschema, inputverticestable, modificationschema, modificationverticestable, resultschema, resultverticestable, resultattributeexpressions \[ , ], \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the primary input table inputverticestable string input vertices table (with the alias a ) this table must have an id column modificationschema string a schema for the modification vertices table modificationverticestable string modification vertices table (with the alias b ) this table must have an id column resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create resultattributeexpressions arraylist a list of sql expressions that define the non identifier columns of the joined result each expression can reference the left ( inputverticestable ) vertex as a and the right ( modificationverticestable ) vertex as b you must end every expression with an explicit alias using as alias name so the output column names are stable resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example merge vertex attributes and generate indexes for the id and status columns this example includes two sql expressions to update the status and score columns based on the modification vertex table using the coalesce sql reference function ocgraph joinvertices( "sales", "customers", "sales", "customers updates", "sales", "customers merged", new arraylist<>(list of( "coalesce(b new status, a status) as status", "coalesce(b score delta + a score, a score) as score" )), new arraylist<>(list of("id","status")), stmt ); innerjoinvertices performs an inner join on two vertex tables using an equality comparison a id = b id the result table automatically includes the id column from the first table the method must include a list of sql expressions ( resultattributeexpressions ) in the same column order for every non identifier column in the merged result table these sql expressions can add computations to columns, or simply add aliases if no changes are needed each expression can reference columns from the primary table using the alias a or from the modification table using the alias b syntax innerjoinvertices( inputschema, inputverticestable, otherschema, otherverticestable, resultschema, resultverticestable, resultattributeexpressions \[ , ], \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a schema for the left vertices table inputverticestable string input vertices table for the left side of the join (must have an id column) otherschema string a schema for the right vertices table otherverticestable string the other vertices table for the right side of the join (must have an id column) resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create resultattributeexpressions arraylist a list of sql expressions that define the non identifier columns of the joined result each expression can reference the left ( inputverticestable ) vertex as a and the right ( otherverticestable ) vertex as b you must end every expression with an explicit alias using as alias name to ensure the output column names are stable resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example create an inner join between two vertex tables and generate an index on the id column ocgraph innerjoinvertices( "sales", "customers", "sales", "profiles", "sales", "customers joined", new arraylist<>(list of( "a id as id", "a status as status", "b tier as tier" )), new arraylist<>(list of("id")), stmt ); outerjoinvertices performs a left outer join between two vertices tables using an equality comparison a id = b id the result table includes all rows from the left table for left table rows that have no match in the right table, any expression that reads columns from the right table with the alias b evaluates to null (while expressions that only read the table with the alias a remain non null as usual) the method must include a list of sql expressions ( resultattributeexpressions ) in the same column order for every non identifier column in the merged result table these sql expressions can add computations to columns or simply add aliases if no changes are needed each expression can reference columns from the primary table using the alias a or from the modification table using the alias b syntax outerjoinvertices( inputschema, inputverticestable, otherschema, otherverticestable, resultschema, resultverticestable, resultattributeexpressions \[ , ], \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables (aliased as a ) inputverticestable string left vertices table (must have an id column) otherschema string schema of the right vertices table (with the alias b ) otherverticestable string right vertices table (must have an id column) resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create resultattributeexpressions arraylist a list of sql expressions that define the non id columns of the joined result each expression can reference the left ( inputverticestable ) vertex as a and the right ( otherverticestable ) vertex as b you must end every expression with an explicit alias using as alias name to ensure the output column names are stable resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example perform a left outer join on two vertices tables and generate an index on the id column ocgraph outerjoinvertices( "sales", "customers", "sales", "profiles", "sales", "customers joined", new arraylist<>(list of( "a id as id", "a status as status", "b tier as tier" )), new arraylist<>(list of("id")), stmt ); collectneighbors for each vertex in a table, this method collects information on neighbors (identifier and any attributes) as an array of tuples for a specified direction ( in , out , or both ), the method aggregates tuples representing each neighboring vertex into an array the direction types are in — neighbors with edges pointing to the vertex (edges where destid = id ) out — neighbors that the vertex points to (edges where srcid = id ) both — union of in and out with neighbors from incoming ( destid = id ) and outgoing ( srcid = id ) edges the result table has the columns id (the vertex identifier) and neighbors (an array of tuples representing each neighbor) if an error occurs after table creation, the method drops the result table syntax collectneighbors( inputschema, inputverticestable, inputedgestable, resultschema, resulttable, direction, \[ resultindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the result tables resulttable string name of the neighbors collection table (with the id and neighbors columns) direction edgedirection specifies the direction of traversal supported values are in — neighbors that have edges pointing to the vertex (edges where destid = id ) for example, if an edge 5 points to 10 , then for id=10 , neighbor 5 is included out — neighbors that the vertex points to (edges where srcid = id ) for example, if an edge 5 points to 10 , then for id=5 , neighbor 10 is included both — the union of in and out this traversal includes neighbors from edges pointing to id and edges from id resultindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example collect incoming neighbors for each vertex and generate an index on the id column the direction argument set to in collects neighbors pointing to id ocgraph collectneighbors( "sales", "customers", "purchases", "sales", "neighbors in", ocgraph edgedirection in, new arraylist<>(list of("id")), stmt ); collectedges for each vertex in a table, this method collects an array of adjacent edge rows based on the specified direction the result table has two columns id (the vertex identifier) and edges (an array of tuples, each tuple containing all columns from the edges table for a connected edge) the direction types are in — edges pointing to the vertex (edges where destid = id ) out — edges originating from the vertex (edges where srcid = id ) both — union of in and out that includes edges from incoming ( destid = id ) and outgoing ( srcid = id ) directions this direction retains duplicates if an error occurs after table creation, the method drops the result table syntax collectedges( inputschema, inputverticestable, inputedgestable, resultschema, resulttable, direction, \[ resultindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the result tables resulttable string name of the edge collection table (with the id and edges columns) direction edgedirection specifies the direction of traversal supported values are in — neighbors that have edges pointing to the vertex (edges where destid = id ) for example, if an edge 5 points to 10 , then for id=10 , neighbor 5 is included out — neighbors that the vertex points to (edges where srcid = id ) for example, if an edge 5 points to 10 , then for id=5 , neighbor 10 is included both — the union of in and out this traversal includes neighbors from edges pointing to id and edges from id resultindexes arraylist optional columns to index specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example collect outgoing edges for each vertex and generate an index for the id column the example sets the direction to out to collect edges from id ocgraph collectedges( "sales", "customers", "purchases", "sales", "outgoing edges", ocgraph edgedirection out, new arraylist<>(list of("id")), stmt ); algorithms high level graph algorithms that iterate over the graph structure to produce labels, components, or counts labelpropagation executes the https //en wikipedia org/wiki/label propagation algorithm (lpa) to assign community labels to vertices each vertex starts with its own identifier as its label for a number set by the maxiterations argument, each vertex updates its label to the most frequent label among its neighbors the algorithm determines ties by choosing the smallest label the algorithm uses temporary tables for intermediate results and drops these tables when the process completes or if it fails isolated vertices retain their initial label the final table stores id and label columns and can include indexes syntax labelpropagation( inputschema, inputverticestable, inputedgestable, resultschema, resultverticestable, maxiterations, \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create (columns include id and label ) maxiterations numeric maximum number of iterations (must be 1 or greater) resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example run label propagation for 10 iterations and assign labels to vertices generate an index on the id column ocgraph labelpropagation( "sales", "customers", "purchases", "sales", "lpa labels", 10, new arraylist<>(list of("id")), stmt ); connectedcomponents identifies the connected components of an undirected graph this algorithm configures a pregel computation in which each vertex initially sets its component label equal to its own identifier id in each iteration, vertices send their component label to neighbors each vertex updates based on the aggregated minimum value of its current component label and any received values the process repeats until no more updates occur or when it reaches the maximum specified number of iterations ( maxiterations ) the result table maps each vertex id to its final component label syntax connectedcomponents( inputschema, inputverticestable, inputedgestable, resultschema, resultverticestable, maxiterations, \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create maxiterations numeric maximum number of iterations (must be 1 or greater) resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example compute connected components for a maximum of 20 iterations and generate an index on the id column ocgraph connectedcomponents( "sales", "customers", "purchases", "sales", "components", 20, new arraylist<>(list of("id")), stmt ); stronglyconnectedcomponents computes strongly connected components (scc) in a directed graph this method runs a recursive algorithm that partitions vertices into subsets where every vertex is reachable from other vertices in the same subset this method uses recursive partitioning the algorithm selects a pivot (typically the minimum identifier id ), computes its predecessor set (vertices that can reach the pivot), and its descendant sets (vertices reachable from the pivot) then, the method identifies the scc as their intersection, removes that scc from the graph, and recurses on the remainder until all vertices have been assigned to an scc the output contains columns for the id and component identifiers (the minimum id in the scc) the method creates temporary tables in the result schema to store intermediate results this method drops these tables when the computation completes or fails the final result table contains two columns id (vertex identifier) and component (the minimum vertex identifier in its scc subset) syntax stronglyconnectedcomponents( inputschema, inputverticestable, inputedgestable, resultschema, resultverticestable, \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create (columns include id and component ) resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example compute the scc and generate an index on the id column ocgraph stronglyconnectedcomponents( "sales", "customers", "purchases", "sales", "scc", new arraylist<>(list of("id")), stmt ); trianglecount trianglecount identifies all 3 cycles (triangles) in the graph and counts how many distinct triangles each vertex participates in the algorithm first builds a canonical, undirected edge set by ensuring srcid < destid and removing duplicates to prevent double counting if your input edges are already canonicalized and deduplicated, use trianglecount runprecanonicalized to skip preprocessing for faster performance the method then counts triangles ( a , b , c ) where a < b < c by intersecting neighbor lists and aggregates per vertex participation to produce a result table with the id and triangle count columns run syntax trianglecount run( inputschema, inputverticestable, inputedgestable, resultschema, resultverticestable, \[ resultverticesindexes \[ , ] ], stmt ) runprecanonicalized syntax trianglecount runprecanonicalized( inputschema, inputverticestable, canonicaledgestable, resultschema, resultverticestable, \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) for runprecanonicalized execution, the method assumes this table is already canonicalized (e g , srcid < destid ) and deduplicated resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create (columns include id and triangle count ) resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection examples count triangles using run canonicalize the raw edges internally, count unique triangles, and write per vertex triangle counts with an index on the id column ocgraph trianglecount run( "sales", "customers", "purchases", "sales", "triangle counts", new arraylist<>(list of("id")), stmt ); count triangles using runprecanonicalized use a pre canonicalized, deduplicated edge table to count triangles and write per vertex triangle counts with an index on the id column ocgraph trianglecount runprecanonicalized( "sales", "customers", "purchases canonical", "sales", "triangle counts", new arraylist<>(list of("id")), stmt ); pregel provides a generic vertex‑centered iteration framework for custom graph algorithms, similar to the https //research google/pubs/pregel a system for large scale graph processing/ each iteration updates vertex states by sending messages along edges and then aggregating these messages to compute new states the algorithm continues iterating until it reaches convergence (no state changed or no messages produced) or a specified iteration cap the algorithm uses multiple specified sql expressions syntax pregel( inputschema, inputverticestable, inputedgestable, resultschema, resultverticestable, initializerexpr, \[ sendtosourceexpr ], \[ sendtodestexpr ], aggregateexpr, updaterexpr, maxiterations, \[ resultverticesindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create (columns include id and result ) initializerexpr string a sql expression to compute the initial state for each vertex (e g , case when type = 'seed' then 1 0 else 0 0 end ) sendtosourceexpr string optional defines the message sent to the source vertex of an edge, referencing the current vertex states a state (source) and c state (destination), and any edge attributes in b example case when a country = c country then 1 else 0 end sendtodestexpr string optional defines the message sent to the destination vertex of an edge, referencing the current vertex states a state (source) and c state (destination), and any edge attributes in b example case when a country = c country then 1 else 0 end aggregateexpr string a sql aggregation to combine messages per vertex (e g , sum(msg) ) for a list of supported aggregations, see docid\ roka1ck6hndmod1smej1s updaterexpr string a sql expression to compute the next state from the current state and aggregated messages for example, this code updates the state to the minimum current state and the aggregated message (similar to the docid\ hoxig9l4m5f2kisrve3ge method) least(a state, coalesce(m aggregated message, a state)) as state maxiterations numeric maximum number of iterations (must be 1 or greater) resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example run a simple pregel computation summing incoming edge amounts into the vertex state for 10 iterations at most, and generate an index on the id column ocgraph pregel( "sales", "customers", "purchases", "sales", "pregel result", "0 as state", "b amount", null, "sum(msg) as aggregated message", "state + coalesce(aggregated message, 0) as state", 10, new arraylist<>(list of("id")), stmt ); paths and ranking these methods include the shortest path and pagerank algorithms shortestpaths computes the shortest distance from every vertex to each set of landmark vertices using an iterative relaxation algorithm the algorithm resembles https //en wikipedia org/wiki/bellman%e2%80%93ford algorithm but simultaneously handles multiple destinations each landmark starts at distance 0 and all others at positive infinity on each iteration, the algorithm examines every edge and checks whether traveling through the connected neighbor would yield a shorter route to a landmark if a shorter route exists, the algorithm updates the distance of the source vertex the process stops when no distances improve, or the algorithm reaches the maximum number of iterations after the process finishes, the algorithm writes a result table with the srcid , destid , and distance columns syntax shortestpaths( inputschema, inputverticestable, inputedgestable, resultschema, resulttable, landmarks \[ , ], \[ edgeweightcolumn ], maxiterations, \[ resultindexes \[ , ] ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the result table resulttable string name of the distances result table this table has the srcid , destid , and distance columns landmarks arraylist one or more landmark vertex identifiers this list must be non empty and contain no nulls edgeweightcolumn string optional the name of an edge weight column if you do not specify this argument, the default value is 1 0 maxiterations numeric maximum number of relaxation iterations this value must be 1 or greater resultindexes arraylist optional the list of columns to index in the result table ( srcid , destid , and distance columns) specify an empty list for none stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example compute distances from landmarks and generate indexes on the src and dest columns ocgraph shortestpaths( "sales", "customers", "purchases", "sales", "distances", new arraylist<>(list of(1l, 42l)), null, // unweighted 10, new arraylist<>(list of("srcid","destid")), stmt ); staticpagerank computes https //en wikipedia org/wiki/pagerank scores for each vertex over a fixed number of iterations the algorithm follows the standard pagerank formula with a reset probability ( resetprob ) and uses common table expressions to calculate contributions from incoming edges and redistribute rank from dangling nodes the algorithm supports two variants standard pagerank — all vertices start with rank 1 0/n , where n is the number of vertices specify this variant if personalizationsrcid is null personalized pagerank — the specified vertex starts with a rank of 1 0 , while others start with a rank of 0 0 specify this variant if personalizationsrcid is a vertex identifier after running pagerank for a fixed number of iterations, the method writes a result vertices table containing all original vertex columns with a new pagerank scoring column syntax staticpagerank( inputschema, inputverticestable, inputedgestable, resultschema, resultverticestable, numiterations, resetprob, \[ resultverticesindexes \[ , ] ], \[ personalizationsrcid ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create (columns include all vertex columns and a pagerank column) numiterations numeric number of iterations to run this value must be 1 or greater resetprob numeric a value between 0 0 and 1 0 that controls the damping factor for how pagerank random surfer moves across vertices a high value (e g , 0 85 ) puts more emphasis on link structure, encouraging the algorithm to move from each vertex to a neighbor this value means pagerank scores tend to concentrate around well linked regions a lower score (e g , 0 50 ) allows the algorithm to ignore edges and instead jump to a vertex chosen from a base distribution for standard pagerank, this base is uniform over all vertices for personalized pagerank, the base is biased toward the specified vertex this behavior creates more uniform scoring with less sensitivity to link topology resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none personalizationsrcid long determines whether pagerank uses the standard or the personalized variant for personalized mode, specify a vertex identifier pagerank scoring starts with this vertex set at 1 0 , while all other vertices start at 0 0 for standard mode, specify null all vertices start with rank 1 0/n , where n is the number of vertices stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example run fixed iteration pagerank and generate an index on the id column this example uses a high reset probability resetprob of 0 85 to ensure the ranking concentrates on highly linked regions ocgraph staticpagerank( "sales", "customers", "purchases", "sales", "pagerank static", 10, 0 85, new arraylist<>(list of("id")), null, stmt ); dynamicpagerank computes pagerank scores until convergence based on a specified threshold value ( tolerance ) unlike the docid\ hoxig9l4m5f2kisrve3ge method, this algorithm runs iterations until the sum of absolute differences between ranks in successive iterations is less than or equal to the tolerance value the algorithm handles personalization similarly to staticpagerank at each iteration, the method uses the pagerank formula, collects rank values, and redistributes them the algorithm supports two variants standard pagerank — all vertices start with rank 1 0/n , where n is the number of vertices specify this variant if personalizationsrcid is null personalized pagerank — the specified vertex starts with a rank of 1 0 , while others start with a rank of 0 0 specify this variant if personalizationsrcid is a vertex identifier after running pagerank until the system reaches the tolerance threshold, the method writes a vertices table containing all the original vertex columns with a new pagerank scoring column syntax dynamicpagerank( inputschema, inputverticestable, inputedgestable, resultschema, resultverticestable, tolerance, resetprob, \[ resultverticesindexes \[ , ] ], \[ personalizationsrcid ], stmt ) argument data type description inputschema string a non empty schema containing the input tables inputverticestable string input vertices table (must have an id column) inputedgestable string input edges table (must have the srcid and destid columns) resultschema string a writable schema to create the result tables resultverticestable string name of the vertices table to create (columns include all vertex columns and pagerank ) tolerance numeric the convergence threshold that stops iterations when the rank changes become negligible after each iteration, the algorithm measures the maximum change in any rank of a vertex if that change is below the specified tolerance, the algorithm considers the computation converged and stops early a high tolerance value (e g , 0 001 ) is good for quick exploratory runs that generate an approximate ranking a low tolerance value (e g , 0 000001 ) generates a precise ranking, but requires a higher compute cost resetprob numeric a value between 0 0 and 1 0 that controls the damping factor for how the pagerank random surfer moves across vertices a high value (e g , 0 85 ) puts more emphasis on link structure, encouraging the algorithm to move from each vertex to a neighbor the high value means pagerank scores tend to concentrate around well linked regions a lower score (e g , 0 50 ) allows the algorithm to ignore edges and instead jump to a vertex chosen from a base distribution for standard pagerank, this base is uniform over all vertices for personalized pagerank, the base is biased toward the specified vertex this behavior creates more uniform scoring with less sensitivity to link topology resultverticesindexes arraylist optional columns to index in the result vertices table (e g , id ) specify an empty list for none personalizationsrcid long determines whether pagerank uses the standard or the personalized variant for personalized mode, specify a vertex identifier pagerank scoring starts with this vertex set at 1 0 , while all other vertices start at 0 0 for standard mode, specify null all vertices start with rank 1 0/n , where n is the number of vertices stmt java sql statement jdbc statement for executing sql you must have a valid statement and a database connection example run dynamic pagerank to convergence this example uses a low tolerance value of 1 0e 6 , which generates high precision rankings but requires more computing resources ocgraph dynamicpagerank( "sales", "customers", "purchases", "sales", "pagerank dynamic", 1 0e 6, 0 85, new arraylist<>(list of("id")), null, stmt ); bibliography pregel a system for large scale graph processing ” accessed november 18, 2025 https //research google/pubs/pregel a system for large scale graph processing/ related links docid apnndn tjqmjdd5oqdvd docid\ hnemo4f1y3sslallr03vp