Load Data

Load Data from External Sources in Data Pipelines

the data pipeline functionality in the {{ocient}} system enables you to load data from external sources, such as other databases, using the lookup function and the lookup keyword of the corresponding create pipeline sql statement you must use the function and the lookup keyword together to join data from an external source you can use the lookup function with multiple external sources by specifying each source with its lookup function and corresponding lookup keyword syntax syntax the lookup function returns a value from an external source table based on a join between a specified value and another column in the source table the function creates and executes this sql statement from the specified function arguments select return column name from lookup source name where value = join column name; the source table should only have unique values for the join column name column for the join operation lookup(lookup source name, value, join column name, return column name) argument data type description lookup source name varchar the name of the external source defined by the lookup keyword in the create pipeline sql statement value varchar, boolean, byte, smallint, int, bigint, float, double, date, time, timestamp, hash, binary, decimal, uuid the value for the lookup join column name varchar the name of a column in the external source table for the join operation return column name varchar, boolean, byte, smallint, int, bigint, float, double, date, time, timestamp, hash, binary, decimal, uuid the name of the column in the external source table that contains the data to load if the data type of the returned value of the lookup function is not one of the data types specified for the return column name argument, then the function transforms the value to the nearest compatible type, typically varchar example create a data pipeline lookup data pipeline that loads data from the csv file data csv use an external source existing all types with the table tablename in the schema schemaname using a jdbc connection with the connection string jdbc\ host //111 1 1 1 4200/databasename;user=username\@databasename;password=testpassword for your connection string, substitute these variables with the values specific to your database and credentials host — hostname 111 1 1 1 — ip address 4200 — port number databasename — database name username — username testpassword — password look up data in the col binary column of the tablename table by joining the second column of the csv file to the col bigint column in the tablename table load the data in the col binary column based on the result of the lookup function create pipeline lookup data pipeline source filesystem filter '/tmp/folder/data csv' lookup existing all types connection type 'jdbc' connection string 'jdbc\ host //111 1 1 1 4200/databasename;user=username\@databasename;password=testpassword' lookup schema 'schemaname' lookup table 'tablename' extract format csv into public destination table select $1 as col pk, lookup('existing all types', $2, 'col bigint', 'col binary') as col binary; performance considerations the performance of the lookup operation depends on the size of the external source if the external source is large, the lookup operation is slower the ocient system recommends keeping the size to less than 5 million rows for the total size of all external source tables used by actively running data pipelines in the system related links docid\ zncvnrhsf6fg1yvqk6mxt docid\ pbyszqvu5wonpgoso qto docid\ ti3mdibvgmuudmlqu9xpl