JDBC Spark Connector

the {{ocient}} {{spark}} connector is a https //downloads apache org/spark/docs/2 3 1/api/java/index html?org/apache/spark/sql/sources/v2/datasourcev2 html implementation that adapts an ocient system to operate as a first class source and sink for spark workloads built on top of the ocient jdbc driver, the connector allows spark to read from and write to ocient tables using spark apis and sql statements the connector implements spark catalog and table interfaces so you can register ocient as a catalog (for create table , insert , select , and show tables sql statements) or use it for ad‑hoc reads and writes key features the ocient spark connector includes these key features read pushdown — the connector accelerates reads by pushing column selection, filters (including on nested fields), aggregations, and queries that only need the first n rows down to the ocient system while still letting spark validate the final results read partitioning — the connector parallelizes reads by splitting data into multiple spark partitions based on a partition column for details, see docid\ pp91aew4 1hy1pft3f4zs https //spark apache org/docs/latest/sql programming guide html write behavior and save modes — the connector controls how it writes dataframes to ocient tables by honoring spark save modes to append ( append ), truncate‑and‑replace ( overwrite ), or fail on existing tables ( errorifexists ) catalog support — the connector exposes ocient as a spark catalog so you can use standard spark sql directly on an ocient system prerequisites to use the ocient spark connector, your system must meet these software requirements 300,301 true left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left 1 1 unhandled content type left 1 1 unhandled content type left unhandled content type left unhandled content type left 1 1 unhandled content type left 1 1 unhandled content type additionally, you must have the select , insert , create , and delete user privileges for the specified database for details, see docid\ f55ngxtki0f7kkmyatvug ocient spark connector setup and initial use to start working with the ocient spark connector, register the connector then, you can start executing sql statements connector registration for best results, first register the connector as a catalog in spark to register the connector, edit the spark defaults conf file in your spark install to include these lines replace the username and password fields with your ocient system credentials spark sql catalog ocient cat=com ocient spark v2 defaultsource spark sql catalog ocient cat url=jdbc\ ocient //host\ port/db spark sql catalog ocient cat user=\<username> spark sql catalog ocient cat password=\<password> use sql statements after registration, the spark connector lets you treat your ocient system like any other spark catalog the connector routes sql operations through the catalog implementation execute the spark command use to switch to your ocient catalog and schema for sql statements in this case, use the ocient cat catalog and my schema schema use ocient cat my schema; subsequent commands default to your ocient catalog and schema, so you no longer need to reference them this example creates a new table my new table with identifier id , name name , event timestamp event , and the structure of an integer and string nested date create table my new table ( id bigint, name varchar, event time timestamp, nested data struct\<a int, b string> ); insert a row into the new table insert into my new table values (1,'foo', '2025 01 01 12 00 00', (100, 'bar')); read the row from the table select from my new table where id = 1; list the table show tables; drop the table drop table my new table; use scala dataframes the ocient spark connector integrates directly with the spark dataframe api, so you can read from and write to ocient tables using familiar spark patterns after you configure the ocient catalog, you can reference fully qualified table names, and the connector handles all jdbc connectivity and type mapping the examples in this section use https //www scala lang org/ to interact with an ocient catalog examples write from spark to ocient this example takes an existing spark dataframe df and writes its rows into an ocient table my table scala df write saveastable("ocient cat my schema my table") write from ocient to spark this example reads from the ocient table my table and writes its rows into a new spark dataframe df2 scala val df2 = spark table("ocient cat my schema my table") ad hoc usage the ocient spark connector supports ad‑hoc reads and writes using the spark format("ocient") method this method is useful for brief operations, but it cannot use the spark catalog system to create, drop, or list tables for example, this spark command reads an ocient table and creates the dataframe df from its contents substitute jdbc connection with the jdbc connection string for the database, the username and pwd values for your ocient username and password, and my schema and my table with the schema and table name for the table to read scala val df = spark read format("ocient") option("url", "jdbc connection") option("user", "username") option("password", "pwd") option("dbtable", "my schema my table") load() this command takes the dataframe df and appends its contents into an ocient table scala df write format("ocient") option("url", "jdbc connection") option("user", "username") option("password", "pwd") option("dbtable", "my schema my table") mode("append") save() bulk loading best practices use these recommended os and spark settings to get reliable performance and avoid inconsistent writes when using the ocient jdbc bulk loader with spark for details on bulk loading, see docid apnndn tjqmjdd5oqdvd linux ssh configuration increase the ssh connection capacity on loader nodes set maxstartups 1024 in the os sshd config configuration file on the loader/ssh endpoint hosts that accept ssh connections from the bulk loader restart the ssh service to apply the updated sshd config configuration for example, on an {{ubuntu}} system, run sudo systemctl restart ssh spark configuration edit the spark defaults conf configuration file to include these settings spark task maxfailures = 1 — this configuration prevents spark from retrying failed tasks and potentially duplicating writes spark speculation = false — this configuration prevents spark from launching speculative duplicate tasks that can re run writes against ocient configuration options you can set specific configurations for the ocient spark connector through standard spark options set globally using spark configuration add options to your spark defaults conf file or your cluster spark settings (e g , spark sql catalog ocient cat url= ) set options per job or per operation use spark ( option() ) or command line ( conf ) statements to set options for one time usage the connector passes most of these settings through to the underlying ocient jdbc driver as connection properties, but the system interprets a few directly by the connector to shape the generated sql connection options these options control how the connector establishes a jdbc connection to ocient and identify which table or query spark should use all options are for both read and write operations true 126,75 7873303167421,460 2126696832579left rgb(31, 41, 55) rgb(31, 41, 55) unhandled content type left rgb(31, 41, 55) rgb(31, 41, 55) unhandled content type left rgb(31, 41, 55) rgb(31, 41, 55) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) 1 1 unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) 1 1 unhandled content type left rgb(12, 18, 29) rgb(12, 18, 29) 1 1 unhandled content type read partitioning options these options control how spark splits a read into multiple partitions based on a column range, affecting parallelism and data distribution during ocient table scans all options are for read operations only if you do not specify any of these options, you have only one partition true 164,138 58371040723983,359 41628959276017 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left 1 1 unhandled content type left 1 1 unhandled content type left 1 1 unhandled content type read performance options these options tune how efficiently the connector fetches rows from ocient during reads, including jdbc fetch size and ocient system internal parallelism all options are for read operations only true 177,77 40271493212671,407 59728506787326 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type write performance options this option tunes how efficiently spark writes data to ocient, primarily by controlling the jdbc batch size used for inserts this option is for write operations only true 104,111 73755656108597,446 26244343891403 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type data type mapping the ocient spark connector supports all ocient primitive types and complex types (such as array or tuple) when spark creates a table, the connector writes the full spark logical type into an ocient type hint clause on each column for example, for a column that uses an ocient tuple type and maps to a spark struct type, the connector generates column ddl that includes a type hint such as type hint 'struct\<mycol string, another int>' during a read operation, the connector parses this type hint field to reconstruct the original spark schema, including nested field names if the connector does not find the hint (e g , for a pre existing table), the connector maps ocient tuple types to spark struct types with default field names ( 1 , 2 , etc ) data types this table shows the spark data types that correspond to the equivalent ocient types the table also lists whether each spark type supports round trips, meaning you can write the type from spark to ocient (creating the table) and then read it back into spark while preserving the original spark type and structure true 220,220,221 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type spark 4 0 types when you run the connector on spark 4 0 or later, the connector detects these additional spark types at runtime using reflection and maps them to the corresponding ocient types and type hint values, without introducing a compile time dependency on spark 4 0 apis true 220,164,277 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left 1 1 unhandled content type left 1 1 unhandled content type left 1 1 unhandled content type related links docid 1 p8y vgpzkd8k 0hxqd7 docid apnndn tjqmjdd5oqdvd docid\ vknnjxbrekwndt3kpt3ln {{linux}} is the registered trademark of linus torvalds in the u s and other countries