Loading and Transformation Ref...

LAT Client Command Line Interface

The LAT Client can be used to interact with a running LAT instance. It supports subcommands for interacting with pipelines, and previewing transformations.

The LAT Client is distributed in the form of a wheel file. Contact Support for the wheel that corresponds to the LAT version.

Prerequisites

  • Python >= 3.8
  • pip3 >= 20.2.3
    • If pip install fails, try upgrading pip.
  • Wheel Python package
    • pip install wheel

Install

It is recommended to install the wheel in a Python virtual environment to avoid conflicts with globally installed Python packages. For the install command, replace $VERSION with the latest version of the LAT client.

Steps:

  1. Create the virtual environment. python3 -m venv venv
  2. Activate the virtual environment. source venv/bin/activate
  3. Install the wheel. pip install lat_client-$VERSION-py3-none-any.whl
  4. Run commands with lat_client COMMAND ARGS
  5. When a new terminal is opened, repeat step 2 to activate the environment and gain access to the lat_client command.

Usage

Get help on the command line:

Shell


Common Arguments

Some arguments are available on all subcommands. For convenience, most of them can also be set using an environment variable.

--no-verify

Skip certificate validation when connecting to LAT. Ignored if using http scheme

Example:

--no-verify

--hosts (LAT_HOSTS)

One or more LAT hosts to orchestrate. Valid domain names or IP addresses can be used.

Example:

--hosts http://192.168.0.1:8080 http://192.168.0.2:8081

Environment:

export LAT_HOSTS="http://10.4.0.1:8080,http://10.4.0.2:8081"

--oauth-domain (LAT_OAUTH_DOMAIN)

 OAuth domain to use for token acquisition.

Example:

--oauth-domain https://dev-12345678.okta.com

Environment:

export LAT_OAUTH_DOMAIN="https://dev-12345678.okta.com"

--oauth-server (LAT_OAUTH_SERVER)

Okta OAuth authorization server to use for token acquisition.

Example:

--oauth-server abcdef000ghijklm111

Environment:

export LAT_OAUTH_SERVER="abcdef000ghijklm111"

--client-id (LAT_CLIENT_ID)

Okta client id to use for token acquisition.

Example:

--client-id 12345678

Environment:

export LAT_CLIENT_ID="12345678"

--client-secret (LAT_CLIENT_SECRET)

Okta client secret to use for token acquisition.

Example:

--client-secret abc123

Environment:

export LAT_CLIENT_SECRET="abc123"

--oauth-http-proxy (LAT_OAUTH_HTTP_PROXY)

HTTP proxy URL to use for token acquisition. Authentication credentials can be passed in proxy URL.

Example:

--oauth-http-proxy http://user:[email protected]

Environment:

export LAT_OAUTH_HTTP_PROXY="http://user:[email protected]"

Subcommands

pipeline create

Create a new pipeline.

For most use cases, it is advisable to leave the pipeline_id unset when creating a pipeline. The client will set it to a random UUID to prevent deduplication across different pipelines. In the event that you want deduplication between pipelines, the pipeline_id should be copied from the previous pipeline and included in the new pipeline. The Transform Configuration must also be the same to ensure deduplication.

When the client is used to create a pipeline with a file source, the client will make adjustments to the source configuration such that partitions are assigned evenly across nodes.

First, the client will get the number of workers from the pipeline. If one is not set, it will use the minimum configured lat.default.workers instead. Then, it will set partitions = workers * num_nodes. Finally, it will set partitions_assigned = [workers * node_index, workers * (node_index + 1) - 1] for each node.

Arguments: 

--pipeline: path to the pipeline configuration .json file 

Example: 

lat_client pipeline create --pipeline /home/user/my_new_pipeline.json 

 

pipeline get

Get the configuration for an existing pipeline.

If the configured pipelines are identical, print the pipeline, otherwise an explanation of the inconsistency will be provided.

For pipelines with a file source, partitions_assigned is ignored when checking if pipelines are identical. Additionally, the client will validate that all partitions are assigned, and that no partition is assigned more than once.

Example:

lat_client pipeline get

pipeline update

Update the configuration for an existing pipeline.

The new pipeline can only make changes to subfields in transform, except for any topic / file_group names. All other subfields of transform are allowed to change, including to the table and column fields.

For pipelines with a file source, partition assignments will be copied from the existing pipeline.

If the pipeline was running prior to the update, successful completion of this command will automatically restart the pipeline.

Arguments: 

--pipeline: path to the pipeline configuration .json file 

Example: 

lat_client pipeline update --pipeline /home/user/my_new_pipeline.json 

pipeline delete

Delete an existing pipeline.

Unless the --force flag is used, a pipeline must be stopped, or deletion will fail.

Arguments: 

- --force: Force a pipeline to delete regardless of running status 

- --skip-validation: Delete a pipeline regardless of cluster consistency 

Example: 

lat_client pipeline delete --skip-validation

pipeline start

Start the configured pipeline.

Prior to starting the pipeline, the pipeline start subcommand will validate that all specified hosts are configured with an identical pipeline, except for pipelines with a file source, which must have different partitions_assigned such that each partition is assigned exactly once across all hosts.

Example:

lat_client pipeline start

pipeline stop

Stop the configured pipeline.

Example:

lat_client pipeline stop

pipeline status

Retrieve status of the pipeline. The valid pipeline statuses are STOPPEDRUNNINGCOMPLETED, and FAILED. When the pipeline is FAILED, the file statuses will remain in processing.

Arguments: 

--list-files: Lists selected files in their sorted order for each file group, along with file statuses (completedprocessingnot_started). Output is summarized to be human readable. The system truncates large file lists.

Arguments: 

--list-all-files: Lists all selected files in their sorted order for each file group, along with file statuses (completedprocessingnot_started). 

Example: 

lat_client pipeline status 

pipeline errors

Retrieve errors that occur while the current pipeline runs.

Arguments: 

- --json: output errors as lines of JSON rather than in the default human-readable format 

- --max-errors MAX_ERRORS: an upper limit on the number of errors to retrieve (default is 100) 

- --only-records: only show records (not error messages or other information) 

- --only-error-messages: only show error messages (not records or other information) 

- --no-records: show all information except records 

Example: 

lat_client pipeline errors --no-records --max-errors 10 

pipeline rebalance

Rebalances partitions evenly to all provided LAT Nodes.

This subcommand only applies to pipelines running file sources.

This command is meant to be used in the case of a node outage during a file load. LAT file loading does not support automatic partition re-balancing and manual intervention is required. The flow is as follows:

  1. LAT Node goes offline.
  2. LAT operator rebalances the partitions from the offline node onto the online nodes using the client. The operator should use the rebalance command and omit the offline node from the hosts argument.
  3. LAT Node comes back online.
  4. LAT operator rebalances partitions using the client to include all nodes including the newly online node. The operator should use the rebalance command and include all online nodes in the hosts argument.

Example:

lat_client pipeline rebalance

sink create

Create a new sink configuration.

The sink configuration file for this subcommand should match the same format as the Sink Configuration. For example:

{ "type": "ocient", "remotes": ["1.2.3.4:5050"]}

Arguments: 

- --sink: path to the sink configuration .json file 

- --name: Name of the sink to create 

- --default: set this as the default sink 

Example: 

lat_client sink create --sink /home/user/my_sink_config.json --name my-sink-name-1 --default 

sink delete

Delete a sink configuration. The sink configuration must not be part of a created or running pipeline.

Arguments: 

--name: Name of the sink to delete 

Example: 

lat_client sink delete --name my-sink-name-1 

sink list

List all configured sinks.

Example:

lat_client sink list

sink get

Get a sink configuration by id.

Arguments: 

--name: Name of the sink to get configuration for.

Example: 

lat_client sink get --name my-sink-name-1 

preview

Preview a transformation.

At most one of --transform or --pipeline can be provided. If neither is provided, the host will attempt to use the transformation configured in its pipeline.

You can specify the --extract or --pipeline option. If you specify none of these options, the host uses the JSON Extractor by default. For details about record and extractor types, see the Extract Configuration.

You must specify the --topic or --file-group option, which should match the topic or file_group key in the specified transform section.

Arguments: 

- --topic name of the topic the records are associated with 

- --file-group name of the file group the records are associated with 

- --records path to a file of records to transform. Record formats can be of Delimited Records (e.g., CSV, TSV), JSON Records, or Fixed Width Binary Records. 

- --extract [Optional] path to a .json file containing the extract section of a pipeline definition to use for extraction. 

- --transform [Optional] path to a .json file containing the transform section of a pipeline definition to use for transformation. 

- --pipeline [Optional] path to a .json file containing a pipeline to use for transformation and extraction, if present. 

Example without pipeline: 

lat_client preview --topic test_topic --records ./data/my_records --extract /home/user/my_extract.json --transform /home/user/my_transform.json 

Example using pipeline: 

lat_client preview --topic test_topic --records ./data/my_records --pipeline /home/user/my_pipeline.json 

Common Workflows

Check on Status of LAT Pipelines

Shell


Updating an Existing LAT Pipeline

Shell


If the pipeline was running prior to the update, successful completion of this command will automatically restart the pipeline.

If unsuccessful, the CLI will report an error with an explanation of what is wrong with the command. Common issues are invalid JSON, missing a required column. An unsuccessful update of the pipeline config does not impact actively running pipelines.

Restart the Pipeline

Shell


Check Multiple LAT Nodes to See If the Pipeline Configurations Are Compatible

Shell


The CLI will compare the MD5 hash of the pipeline configurations on all nodes and respond that all pipelines match or are inconsistent.

LAT Client Command Line Interface Troubleshooting

If the certificate authorities on the system running the Python client (LAT Client) need to be updated, an error can occur. This type of error message might appear.

Shell


You can run this command to resolve the error.

Shell


The root cause of this error can be a connection to either the LAT Server over SSL or an authentication to Okta to obtain an access token over SSL. You can run the same command in both cases to resolve the issue.

When the SSL certificate is self-signed on the LAT Server, you can use the --no-verify flag when you connect to the LAT Server without verifying the SSL Certificate.

Related Links

Load Data