Documentation

Quick start

1. Register for a Tablespace account at https://cloud.tablespace.io
2. Create a Postgres database using the web console. Various shapes suitable for development, staging and production workloads are available. The database created is called csdb with a default schema csschema and a default user csuser.
3. Login to the database using your favourite Postgres client e.g. psql and the credentials provided.
‍

psql 'host=<hostname> user=csuser password=<password> dbname=csdb port=5432 sslmode=require'

4. Use the CREATE TABLE statement to create a table.
‍

CREATE TABLE orders (
      O_ORDERKEY BIGINT NOT NULL,
      O_CUSTKEY BIGINT NOT NULL,
      O_ORDERSTATUS VARCHAR(1) NOT NULL,
      O_TOTALPRICE DECIMAL(15,2) NOT NULL,
      O_ORDERDATE DATE NOT NULL,
      O_ORDERPRIORITY VARCHAR(15) NOT NULL,
      O_CLERK VARCHAR(15) NOT NULL,
      O_SHIPPRIORITY INTEGER NOT NULL,
      O_COMMENT VARCHAR(79) NOT NULL
);

5. Load data into the table using the COPY command. Note it is quicker to load data into the table first before creating columnstore indexes on the table. This also results in faster queries when querying large datasets.
‍

COPY orders FROM './tpch-load/orders.tbl' DELIMITER '|';

6. Create a primary key on the table. Columnstore indexes can only be created on tables that have primary keys.
‍

ALTER TABLE orders ADD CONSTRAINT PK_orders PRIMARY KEY (o_orderkey);

7. Use the CREATE INDEX ... USING columnstore statement to create a columnstore index which accelerates analytical queries. The pg_columnstore extension is automatically enabled on the database when it is created so there is no need to explicitly enable it.
‍

create index orders_col_idx on orders using columnstore
(
      O_ORDERKEY,
      O_CUSTKEY,
      O_ORDERSTATUS,
      O_TOTALPRICE,
      O_ORDERDATE,
      O_ORDERPRIORITY,
      O_CLERK,
      O_SHIPPRIORITY,
      O_COMMENT
)
WITH (
      cluster_by = 'O_ORDERDATE',
      segment_cluster_by = 'O_ORDERPRIORITY'
);

The WITH clause is optional but can be used to optimize the columnstore index and improve query performance further:

A columnstore index is divided into segments which are conceptually similar to table partitions. The cluster_by parameter sorts the data in all of the segments by the specified columns. If optimal columns are chosen for cluster_by it enables the Vectorized Columnar Engine to skip a large number of segments when executing queries. This reduces I/O and can increase query performance dramatically. For optimal performance, columns in cluster_by should be specified as a comma delimited list from the highest to the lowest cardinality. It is optimal to choose columns that are used in the filter clauses of your analytical queries. Although you can specify multiple columns in cluster_by, choosing the single highest cardinality column is usually enough to improve performance.

The segment_cluster_by parameter sorts the data within each segment of the columnstore index. If optimal columns are chosen for segment_cluster_by it improves compression of the columnstore index which results in reduced I/O and increases query performance. For optimal performance, columns in segment_cluster_by should be specified as a comma delimited list from the lowest to the highest cardinality.
‍

Start Tracking - Technology Webflow Template