Cassandra connector
The Cassandra connector allows querying data stored in Apache Cassandra.
Requirements
To connect to Cassandra, you need:
- Cassandra version 2.2 or higher.
- Network access from the Trino coordinator and workers to Cassandra. Port 9042 is the default port.
Configuration
To configure the Cassandra connector, create a catalog properties file
etc/catalog/cassandra.properties
with the following contents,
replacing host1,host2
with a comma-separated list of the Cassandra
nodes, used to discovery the cluster topology:
connector.name=cassandra
cassandra.contact-points=host1,host2
cassandra.load-policy.dc-aware.local-dc=datacenter1
You also need to set cassandra.native-protocol-port
, if your Cassandra
nodes are not using the default port 9042.
Multiple Cassandra clusters
You can have as many catalogs as you need, so if you have additional
Cassandra clusters, simply add another properties file to etc/catalog
with a different name, making sure it ends in .properties
. For
example, if you name the property file sales.properties
, Trino creates
a catalog named sales
using the configured connector.
Configuration properties
The following configuration properties are available:
Property Name | Description |
---|---|
| Comma-separated list of hosts in a Cassandra cluster. The Cassandra driver uses these contact points to discover cluster topology. At least one Cassandra host is required. |
| The Cassandra server port running the native client protocol, defaults to |
| Consistency levels in Cassandra refer to the level of consistency to be used for both read and write operations. More information about consistency levels can be found in the Cassandra consistency documentation. This property defaults to a consistency level of |
| Enables DROP TABLE operations. Defaults to |
| Username used for authentication to the Cassandra cluster. This is a global setting used for all connections, regardless of the user connected to Trino. |
| Password used for authentication to the Cassandra cluster. This is a global setting used for all connections, regardless of the user connected to Trino. |
| It is possible to override the protocol version for older Cassandra clusters. By default, the value corresponds to the default protocol version used in the underlying Cassandra java driver. Possible values include |
If authorization is enabled, cassandra.username
must have enough
permissions to perform SELECT
queries on the system.size_estimates
table.
The following advanced configuration properties are available:
Property Name | Description |
---|---|
| Number of rows fetched at a time in a Cassandra query. |
| Number of partitions batched together into a single select for a single partion key column table. |
| Number of keys per split when querying Cassandra. |
| Number of splits per node. By default, the values from the |
| Maximum number of statements to execute in one batch. |
| Maximum time the Cassandra driver waits for an answer to a query from one Cassandra node. Note that the underlying Cassandra driver may retry a query against more than one node in the event of a read timeout. Increasing this may help with queries that use an index. |
| Maximum time the Cassandra driver waits to establish a connection to a Cassandra node. Increasing this may help with heavily loaded Cassandra clusters. |
| Number of seconds to linger on close if unsent data is queued. If set to zero, the socket will be closed immediately. When this option is non-zero, a socket lingers that many seconds for an acknowledgement that all data was written to a peer. This option can be used to avoid consuming sockets on a Cassandra server by immediately closing connections when they are no longer needed. |
| Policy used to retry failed requests to Cassandra. This property defaults to |
| Set to |
| The name of the datacenter considered "local". |
| Uses the provided number of host per remote datacenter as failover for the local hosts for |
| Set to |
| Set to |
| Set to |
| Comma-separated list of hosts to allow. |
| Retry timeout for |
| The number of speculative executions. This is disabled by default. |
| The delay between each speculative execution, defaults to |
| Whether TLS security is enabled, defaults to |
| Path to the PEM or JKS key store. |
| Path to the PEM or JKS trust store. |
| Password for the key store. |
cassandra.tls.truststore-password | Password for the trust store. |
Querying Cassandra tables
The users
table is an example Cassandra table from the Cassandra
Getting
Started
guide. It can be created along with the mykeyspace
keyspace using
Cassandra's cqlsh (CQL interactive terminal):
cqlsh> CREATE KEYSPACE mykeyspace
... WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE mykeyspace;
cqlsh:mykeyspace> CREATE TABLE users (
... user_id int PRIMARY KEY,
... fname text,
... lname text
... );
This table can be described in Trino:
DESCRIBE cassandra.mykeyspace.users;
Column | Type | Extra | Comment
---------+---------+-------+---------
user_id | bigint | |
fname | varchar | |
lname | varchar | |
(3 rows)
This table can then be queried in Trino:
SELECT * FROM cassandra.mykeyspace.users;
Data types
The data types mappings are as follows:
Cassandra | Trino |
---|---|
ASCII | VARCHAR |
BIGINT | BIGINT |
BLOB | VARBINARY |
BOOLEAN | BOOLEAN |
DATE | DATE |
DECIMAL | DOUBLE |
DOUBLE | DOUBLE |
FLOAT | REAL |
INET | VARCHAR(45) |
INT | INTEGER |
LIST<?> | VARCHAR |
MAP<?, ?> | VARCHAR |
SET<?> | VARCHAR |
SMALLINT | SMALLINT |
TEXT | VARCHAR |
TIMESTAMP | TIMESTAMP(3) WITH TIME ZONE |
TIMEUUID | UUID |
TINYINT | TINYINT |
TUPLE | ROW with anonymous fields |
UUID | UUID |
UDT | ROW with field names |
VARCHAR | VARCHAR |
VARINT | VARCHAR |
Any collection (LIST/MAP/SET) can be designated as FROZEN, and the value is mapped to VARCHAR. Additionally, blobs have the limitation that they cannot be empty.
Types not mentioned in the table above are not supported.
Partition keys can only be of the following types:
- ASCII
- TEXT
- VARCHAR
- BIGINT
- BOOLEAN
- DOUBLE
- INET
- INT
- FLOAT
- DECIMAL
- TIMESTAMP
- UUID
- TIMEUUID
Limitations
- Queries without filters containing the partition key result in fetching all partitions. This causes a full scan of the entire data set, and is therefore much slower compared to a similar query with a partition key as a filter.
IN
list filters are only allowed on index (that is, partition key or clustering key) columns.- Range (
<
or>
andBETWEEN
) filters can be applied only to the partition keys.
SQL support
The connector provides read and write access to data and metadata in the Cassandra database. In addition to the globally available and read operation statements, the connector supports the following features:
- INSERT
- DELETE see DELETE
- TRUNCATE
- CREATE TABLE
- CREATE TABLE
- DROP TABLE
DROP TABLE
By default, DROP TABLE
operations are disabled on Cassandra catalogs.
To enable DROP TABLE
, set the cassandra.allow-drop-table
catalog
configuration property to true
:
cassandra.allow-drop-table=true
SQL delete limitation
DELETE
is only supported if the WHERE
clause matches entire
partitions.