Skip to main content

Overview

The first step to using a database system is to insert data into that system. SereneDB can directly connect to many popular data sources — file formats, cloud storage, and database systems — and offers several data ingestion methods that allow you to easily and efficiently fill up the database. On this page, we provide an overview of these methods so you can select which one is best suited for your use case.

When importing data from another system into SereneDB, we recommend the following order:

  1. If the source system has a bulk export feature, export the data to Parquet or CSV, then load it with SereneDB's Parquet or CSV reader.
  2. If that is not possible, stream the rows over the PostgreSQL COPY ... FROM STDIN protocol. Every PostgreSQL driver exposes it — for example libpq's PQputCopyData, psycopg's copy, JDBC's CopyManager or Go's pq.CopyIn — and it is far faster than row-by-row INSERTs.

Avoid looping row-by-row (tuple-at-a-time): performing row-by-row inserts — even with prepared statements — is detrimental to performance and results in slow load times. Prefer bulk operations instead.

INSERT Statements

INSERT statements are the standard way of loading data into a database system. They are suitable for quick prototyping, but should be avoided for bulk loading as they have significant per-row overhead.

Query
INSERT INTO people VALUES (1, 'Mark');

For a more detailed description, see the page on the INSERT statement.

File Loading: Relative Paths

Use the configuration option file_search_path to configure to which “root directories” relative paths are expanded on. If file_search_path is not set, the working directory is used as the basis for relative paths.

File Formats

CSV Loading

Data can be efficiently loaded from CSV files using several methods. The simplest is to use the CSV file's name:

Query
SELECT * FROM 'test.csv';
Result
 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

Alternatively, use the read_csv function to pass along options:

Query
SELECT * FROM read_csv('test.csv', header = false);
Result
 column0 | column1 | column2---------+---------+---------       1 | alpha   |      10       2 | beta    |      20

Or use the COPY statement:

Query
COPY tbl FROM 'test.csv' (HEADER false);

It is also possible to read data directly from compressed CSV files (e.g., compressed with gzip):

Query
SELECT * FROM 'test.csv.gz';
Result
 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

SereneDB can create a table from the loaded data using the CREATE TABLE ... AS SELECT statement:

Query
CREATE TABLE test AS    SELECT * FROM 'test.csv';

For more details, see the page on CSV loading.

Parquet Loading

Parquet files can be efficiently loaded and queried using their filename:

Query
SELECT * FROM 'test.parquet';
Result
 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

Alternatively, use the read_parquet function:

Query
SELECT * FROM read_parquet('test.parquet');
Result
 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

Or use the COPY statement:

Query
COPY tbl FROM 'test.parquet';

For more details, see the page on Parquet loading.

JSON Loading

JSON files can be efficiently loaded and queried using their filename:

Query
SELECT * FROM 'test.json';
Result
 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

Alternatively, use the read_json_auto function:

Query
SELECT * FROM read_json_auto('test.json');
Result
 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

Or use the COPY statement:

Query
COPY tbl FROM 'test.json';

For more details, see the page on JSON loading.

Returning the Filename

In SereneDB the CSV, JSON and Parquet readers support the filename virtual column:

Query
COPY (FROM (VALUES (42), (43)) t(x)) TO 'test.parquet';
SELECT *, filename FROM 'test.parquet';
Result
 x  | filename----+-------------- 42 | test.parquet 43 | test.parquet