Redshift connector
The Redshift connector allows querying and creating tables in an external Amazon Redshift cluster. This can be used to join data between different systems like Redshift and Hive, or between two different Redshift clusters.
Requirements
To connect to Redshift, you need:
- Network access from the Trino coordinator and workers to Redshift. Port 5439 is the default port.
Configuration
To configure the Redshift connector, create a catalog properties file in
etc/catalog
named, for example, redshift.properties
, to mount the
Redshift connector as the redshift
catalog. Create the file with the
following contents, replacing the connection properties as appropriate
for your setup:
connector.name=redshift
connection-url=jdbc:redshift://example.net:5439/database
connection-user=root
connection-password=secret
Connection security
If you have TLS configured with a globally-trusted certificate installed
on your data source, you can enable TLS between your cluster and the
data source by appending a parameter to the JDBC connection string set
in the connection-url
catalog configuration property.
For example, on version 2.1 of the Redshift JDBC driver, TLS/SSL is
enabled by default with the SSL
parameter. You can disable or further
configure TLS by appending parameters to the connection-url
configuration property:
connection-url=jdbc:redshift://example.net:5439/database;SSL=TRUE;
For more information on TLS configuration options, see the Redshift JDBC driver documentation.
Multiple Redshift databases or clusters
The Redshift connector can only access a single database within a Redshift cluster. Thus, if you have multiple Redshift databases, or want to connect to multiple Redshift clusters, you must configure multiple instances of the Redshift connector.
To add another catalog, simply add another properties file to
etc/catalog
with a different name, making sure it ends in
.properties
. For example, if you name the property file
sales.properties
, Trino creates a catalog named sales
using the
configured connector.
General configuration properties
The following table describes general catalog configuration properties for the connector:
Property name | Description | Default value |
---|---|---|
case-insensitive-name-matching | Support case insensitive schema and table names. | false |
case-insensitive-name-matching.cache-ttl | 1m | |
case-insensitive-name-matching.config-file | Path to a name mapping configuration file in JSON format that allows Trino to disambiguate between schemas and tables with similar names in different cases. | null |
case-insensitive-name-matching.refresh-period | Frequency with which Trino checks the name matching configuration file for changes. | 0 (refresh disabled) |
metadata.cache-ttl | Duration for which metadata, including table and column statistics, is cached. | 0 (caching disabled) |
metadata.cache-missing | Cache the fact that metadata, including table and column statistics, is not available | false |
metadata.cache-maximum-size | Maximum number of objects stored in the metadata cache | 10000 |
write.batch-size | Maximum number of statements in a batched execution. Do not change this setting from the default. Non-default values may negatively impact performance. | 1000 |
Procedures
system.flush_metadata_cache()
Flush JDBC metadata caches. For example, the following system call flushes the metadata caches for all schemas in the
example
catalogUSE example.myschema;
CALL system.flush_metadata_cache();
Case insensitive matching
When case-insensitive-name-matching
is set to true
, Trino is able to
query non-lowercase schemas and tables by maintaining a mapping of the
lowercase name to the actual name in the remote system. However, if two
schemas and/or tables have names that differ only in case (such as
"customers" and "Customers") then Trino fails to query them due to
ambiguity.
In these cases, use the case-insensitive-name-matching.config-file
catalog configuration property to specify a configuration file that maps
these remote schemas/tables to their respective Trino schemas/tables:
{
"schemas": [
{
"remoteSchema": "CaseSensitiveName",
"mapping": "case_insensitive_1"
},
{
"remoteSchema": "cASEsENSITIVEnAME",
"mapping": "case_insensitive_2"
}],
"tables": [
{
"remoteSchema": "CaseSensitiveName",
"remoteTable": "tablex",
"mapping": "table_1"
},
{
"remoteSchema": "CaseSensitiveName",
"remoteTable": "TABLEX",
"mapping": "table_2"
}]
}
Queries against one of the tables or schemes defined in the mapping
attributes are run against the corresponding remote entity. For example,
a query against tables in the case_insensitive_1
schema is forwarded
to the CaseSensitiveName schema and a query against case_insensitive_2
is forwarded to the cASEsENSITIVEnAME
schema.
At the table mapping level, a query on case_insensitive_1.table_1
as
configured above is forwarded to CaseSensitiveName.tablex
, and a query
on case_insensitive_1.table_2
is forwarded to
CaseSensitiveName.TABLEX
.
By default, when a change is made to the mapping configuration file,
Trino must be restarted to load the changes. Optionally, you can set the
case-insensitive-name-mapping.refresh-period
to have Trino refresh the
properties without requiring a restart:
case-insensitive-name-mapping.refresh-period=30s
Non-transactional INSERT
The connector supports adding rows using
INSERT statements </sql/insert>
. By default, data insertion is
performed by writing data to a temporary table. You can skip this step
to improve performance and write directly to the target table. Set the
insert.non-transactional-insert.enabled
catalog property or the
corresponding non_transactional_insert
catalog session property to
true
.
Note that with this property enabled, data can be corrupted in rare cases where exceptions occur during the insert operation. With transactions disabled, no rollback can be performed.
Querying Redshift
The Redshift connector provides a schema for every Redshift schema. You
can see the available Redshift schemas by running SHOW SCHEMAS
:
SHOW SCHEMAS FROM redshift;
If you have a Redshift schema named web
, you can view the tables in
this schema by running SHOW TABLES
:
SHOW TABLES FROM redshift.web;
You can see a list of the columns in the clicks
table in the web
database using either of the following:
DESCRIBE redshift.web.clicks;
SHOW COLUMNS FROM redshift.web.clicks;
Finally, you can access the clicks
table in the web
schema:
SELECT * FROM redshift.web.clicks;
If you used a different name for your catalog properties file, use that
catalog name instead of redshift
in the above examples.
Type mapping
Type mapping configuration properties
The following properties can be used to configure how data types from the connected data source are mapped to Trino data types and how the metadata is cached in Trino.
Property name | Description | Default value |
---|---|---|
| Configure how unsupported column data types are handled:
The respective catalog session property is |
|
jdbc-types-mapped-to-varchar | Allow forced mapping of comma separated lists of data types to convert to unbounded VARCHAR |
SQL support
The connector provides read access and write access to data and metadata in Redshift. In addition to the globally available and read operation statements, the connector supports the following features:
- INSERT
- DELETE
- TRUNCATE
- sql-schema-table-management
SQL DELETE
If a WHERE
clause is specified, the DELETE
operation only works if
the predicate in the clause can be fully pushed down to the data source.
ALTER TABLE
The connector does not support renaming tables across multiple schemas. For example, the following statement is supported:
ALTER TABLE catalog.schema_one.table_one RENAME TO catalog.schema_one.table_two
The following statement attempts to rename a table across schemas, and therefore is not supported:
ALTER TABLE catalog.schema_one.table_one RENAME TO catalog.schema_two.table_two
ALTER SCHEMA
The connector supports renaming a schema with the ALTER SCHEMA RENAME
statement. ALTER SCHEMA SET AUTHORIZATION
is not supported.