Load Data
Load Metadata Values in Data Pipelines
the data pipeline loading infrastructure enables the loading of certain metadata values of a source record into the {{ocient}} system metadata represents information about a record that is not represented in the record data itself, e g , the filename of the source record or the timestamp when a record was created in {{kafka}} syntax the metadata function can load a variety of metadata along with each record of the data pipeline, or it can use key value pairs to return the value in the filename metadata to load a metadata value into a column, use the metadata function in the select sql statement and replace key with the key you want to load you can use transformation functions to achieve the final data type or allow the pipeline to apply automatic casting to retrieve a value from a key value pair in the filename metadata, specify the first argument as 'hive partition' and provide a search string for the key in this case, the filename must follow {{hive}} naming standards the standard embeds field names and key value pairs in path segments, such as /year=2019/month=2/data parquet if there are duplicate key value pairs in the filename, the function uses the last pair the function returns a string with the value of the associated key, and returns null if the file does not contain the search string or the filename does not follow hive naming standards metadata(key) metadata('hive partition',search string) 149,134,380 false true unhandled content type unhandled content type unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type metadata fields cannot be combined in a transformation function with other source data supported values for the metadata key the metadata values you can load differ based on the source type this table lists the available values for the metadata keys specified using the key argument by source type and their returned data type metadata key values are not case sensitive true falsefalse left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type false left unhandled content type examples load filename metadata this example snippet uses the create pipeline sql statement to load data and the source filename with each record in a pipeline for details about this pipeline definition and setup, see data pipeline load of csv data from s3 docid 5xxtimfhjnlyxs48wxsxs create pipeline orders pipeline source s3 endpoint 'https //s3 us east 1 amazonaws com' bucket 'ocient examples' filter 'metabase samples/csv/orders csv' extract format csv num header lines 1 into public orders select $1 as id, $2 as user id, $3 as product id, $4 as subtotal, $5 as tax, $6 as total, $7 as discount, $8 as created at, $9 as quantity, metadata('filename') as filename column; load specified metadata from filename this example uses the create pipeline sql statement to load data and the value of the search string 'year' of the filename with each record in a pipeline for details about this pipeline definition and setup, see data pipeline load of csv data from s3 docid 5xxtimfhjnlyxs48wxsxs the example assumes the filename follows hive style naming standards create pipeline orders pipeline source s3 endpoint 'https //s3 us east 1 amazonaws com' bucket 'ocient examples' filter 'metabase samples/csv/orders csv' extract format csv num header lines 1 into public orders select $1 as id, $2 as user id, $3 as product id, $4 as subtotal, $5 as tax, $6 as total, $7 as discount, $8 as created at, $9 as quantity, metadata('hive partition','year') as hive; related links load data docid\ xq0tg7yph vn62uwufibu data pipelines docid\ l8tdfpfzzvzeyabc2h7bq transform data in data pipelines docid\ aimcafoydn2xf fgqssys