> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ocient.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Load Data from External Sources in Data Pipelines

export const Ocient = "Ocient®";

The data pipeline functionality in the {Ocient} System enables you to load data from external sources, such as other databases, using the `LOOKUP` function and the `LOOKUP` keyword of the corresponding `CREATE PIPELINE` SQL statement. You must use the function and the `LOOKUP` keyword together to join data from an external source.

You can use the `LOOKUP` function with multiple external sources by specifying each source with its `LOOKUP` function and corresponding `LOOKUP` keyword syntax.

**Syntax**

The `LOOKUP` function returns a value from an external source table based on a join between a specified value and another column in the source table. The function creates and executes this SQL statement from the specified function arguments.

```sql SQL theme={null}
SELECT return_column_name
FROM lookup_source_name
WHERE value = join_column_name;
```

The source table should only have unique values for the `join_column_name` column for the join operation.

```sql SQL theme={null}
LOOKUP(lookup_source_name, value, join_column_name, return_column_name)
```

| **Argument**         | **Data** **Type**                                                                                                | **Description**                                                                                         |
| -------------------- | ---------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| `lookup_source_name` | VARCHAR                                                                                                          | The name of the external source defined by the `LOOKUP` keyword in the `CREATE PIPELINE` SQL statement. |
| `value`              | VARCHAR, BOOLEAN, BYTE, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP, HASH, BINARY, DECIMAL, UUID | The value for the lookup.                                                                               |
| `join_column_name`   | VARCHAR                                                                                                          | The name of a column in the external source table for the join operation.                               |
| `return_column_name` | VARCHAR, BOOLEAN, BYTE, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DATE, TIME, TIMESTAMP, HASH, BINARY, DECIMAL, UUID | The name of the column in the external source table that contains the data to load.                     |

<Info>
  If the data type of the returned value of the `LOOKUP` function is not one of the data types specified for the `return_column_name` argument, then the function transforms the value to the nearest compatible type, typically `VARCHAR`.
</Info>

**Example**

Create a data pipeline `lookup_data_pipeline` that loads data from the CSV file `data.csv`. Use an external source `existing_all_types` with the table `tablename` in the schema `schemaname` using a JDBC connection with the connection string: `jdbc:host://111.1.1.1:4200/databasename;user=username@databasename;password=testpassword`. For your connection string, substitute these variables with the values specific to your database and credentials:

* `host` — Hostname
* `111.1.1.1` — IP address
* `4200` — Port number
* `databasename` — Database name
* `username` — Username
* `testpassword` — Password

Look up data in the `col_binary` column of the `tablename` table by joining the second column of the CSV file to the `col_bigint` column in the `tablename` table. Load the data in the `col_binary` column based on the result of the `LOOKUP` function.

```sql SQL theme={null}
CREATE PIPELINE lookup_data_pipeline
SOURCE FILESYSTEM
    FILTER '/tmp/folder/data.csv'
LOOKUP existing_all_types
    CONNECTION_TYPE 'jdbc'
    CONNECTION_STRING 'jdbc:host://111.1.1.1:4200/databasename;user=username@databasename;password=testpassword'
    LOOKUP_SCHEMA 'schemaname'
    LOOKUP_TABLE 'tablename'
EXTRACT
    FORMAT CSV
INTO public.destination_table
SELECT $1 AS col_pk,
    LOOKUP('existing_all_types', $2, 'col_bigint', 'col_binary') AS col_binary;
```

## Performance Considerations

The performance of the lookup operation depends on the size of the external source. If the external source is large, the lookup operation is slower. The Ocient System recommends keeping the size to less than 5 million rows for the total size of all external source tables used by actively running data pipelines in the system.

## Related Links

[Load Data](/load-data)

[CREATE PIPELINE](/data-pipelines#create-pipeline)

[Transform Data in Data Pipelines](/transform-data-in-data-pipelines)
