This tutorial details the core workflow from start to finish. You connect to the system, create a database and table, load sample data from a source, validate the loaded data, and execute analytic queries. The example uses a sample data set that covers every supported Ocient data type. By the end of this process, you have a working table with loaded data that you can query immediately.Documentation Index
Fetch the complete documentation index at: https://docs.ocient.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
The tutorial requires:- Network access to the SQL Nodes in your Ocient System and the IP address or hostname of at least one SQL Node.
- A valid username and password with permissions to create databases, tables, and data pipelines.
- You have installed the Ocient JDBC driver and CLI on your local machine. For setup instructions, see Connect Using JDBC.
If you do not have the JDBC driver installed, you can also connect using
pyocient, the Ocient driver. For setup instructions, see Connect Using pyocient.Step 1: Connect to the Ocient System
Start the JDBC CLI and connect to a SQL Node. Replace the hostname<HOSTNAME_OR_IP> and port <PORT> placeholders with the values for your system. The default SQL Node port is 4050.
SQL
Shell
Step 2: Create a Database and Schema
Create a database to hold your sample data. Using a dedicated database keeps your work isolated from other applications on the same system.SQL
SQL
SQL
Step 3: Create a Table
Create a table that covers all supported Ocient data types. This table serves as the target for the data pipeline in the next step. TheCREATE TABLE SQL statement defines the column names, data types, and a on the col_timestamp column in the data_type_coverage table. The TimeKey partitions data by time so that the system can skip irrelevant segments during time-filtered queries.
SQL
- TimeKeys and Clustering Keys — Segment keys that partition and order data for faster queries.
- Secondary Indexes — Additional indexes for columns used in filters.
- Table Compression Options — Compression settings that reduce storage requirements.
- CREATE TABLE SQL Statement Examples — Examples of table definitions for different use cases.
Step 4: Create a Data Pipeline
Data pipelines are the way to load data into an Ocient System. Each pipeline is a SQL object that defines the source, data format, and transformations for loading rows into one or more tables. For a full overview of pipeline concepts, see Load Data.Use a Sample Data Set
The sample data is a set of gzip-compressed CSV files hosted in a public S3 bucket. Each file contains one header row followed by data rows with columns covering every Ocient data type.Create the Pipeline
This data pipeline reads CSV files from the S3 source, parses each field using named headers, applies type conversions where needed, and loads the results into thedata_type_coverage table.
SQL
SOURCE— Identifies the S3 bucket, file path prefix, and glob filter. TheCOMPRESSION_METHODparameter indicates that the source files are gzip-compressed.EXTRACT— Defines the CSV format options, including delimiters, quoting, and the header names that theSELECTclause references. TheOPEN_ARRAYandCLOSE_ARRAY, andOPEN_OBJECTandCLOSE_OBJECTparameters tell the parser how array and tuple values are encoded in the CSV data.SELECT— Maps each source field to a target column. Some fields require explicit type conversion. For example,DOUBLE($double)converts the extracted string to aDOUBLEvalue, and the nestedREPLACEfunctions strip theTandZcharacters from timestamps to match the OcientTIMESTAMPformat. ThisVARBINARYexpression pads odd-length hex strings with a leading zero to ensure all rows that are notNULLconvert to theVARBINARYdata type.
Step 5: Start and Monitor the Pipeline
Start the data pipeline to begin loading data using theSTART PIPELINE SQL statement.
SQL
SQL
Text
status column to show the COMPLETED status. You can execute the SHOW PIPELINE_STATUS statement again to check progress.
For larger data sets, the percent_complete, records_processed, and records_loaded columns update as the pipeline progresses.
For more detailed monitoring options, including system catalog tables and metrics endpoints, see Monitor Data Pipelines.
If the pipeline enters the FAILED status, see Manage Errors in Data Pipelines and Data Pipeline Loading Errors.
Step 6: Validate the Loaded Data
After the pipeline completes, verify that the data loaded correctly before you begin analysis.Verify the Row Count
Confirm that the number of loaded rows matches the pipeline metrics.SQL
records_loaded value from the SHOW PIPELINE_STATUS statement.
Check the Time Range
Verify that thecol_timestamp column covers the expected date range. Gaps in the range can indicate missing source files.
SQL
Inspect for NULL Values
Check for unexpected NULL values in columns that should be populated.SQL
SELECT transformations for parsing issues.
Sample Records
Review a small set of records to confirm that type conversions applied correctly using theLIMIT keyword.
SQL
Step 7: Query the Data
With validated data in the table, you can begin running analytic queries. These examples demonstrate common query patterns across different data types.Aggregate Numeric Data
Compute summary statistics on the numeric columns. In this case, retrieve the count of rows, the minimum number of thecol_int column, the maximum number of the col_int column, and the average of the numbers in the col_smallint column.
SQL
Filter by Time Range
Use thecol_timestamp TimeKey column to filter data efficiently. The Ocient System skips segments outside the specified time range of January 2024.
SQL
Query Geospatial Data
Inspect the geospatial columns loaded into the table for the first 10 rows using theLIMIT keyword.
SQL
Explore Data Type Distributions
Summarize the distinct values and NULL rates for each column type.SQL
Remove the Data Pipeline and Table
After you finish exploring the sample data, remove thedata_type_coverage_pipeline pipeline.
SQL
data_type_coverage table.
SQL

