Edit this page

Parquet Import

To read data from a Parquet file, use the read_parquet function in the FROM clause of a query:

Query

SELECT * FROM read_parquet('input.parquet');

Result

 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

Alternatively, you can omit the read_parquet function and let SereneDB infer the format from the file name:

Query

SELECT * FROM 'input.parquet';

Result

 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

To create a new table using the result from a query, use the CREATE TABLE ... AS SELECT statement:

Query

CREATE TABLE new_tbl AS    SELECT * FROM read_parquet('input.parquet');

To load data into an existing table from a query, use INSERT INTO from a SELECT statement:

Query

INSERT INTO tbl    SELECT * FROM read_parquet('input.parquet');

Alternatively, use the COPY statement to load data from a Parquet file into an existing table:

Query

COPY tbl FROM 'input.parquet' (FORMAT parquet);

Adjusting the Schema on the Fly

You can load a Parquet file into a slightly different schema (e.g., different number of columns, more relaxed types) using the following trick.

Suppose you have a Parquet file with two columns, c1 and c2:

Query

COPY (FROM (VALUES (42, 43)) t(c1, c2))TO 'f.parquet';

To add another column c3 that is not present in the file, run:

Query

FROM (VALUES (NULL::VARCHAR, NULL, NULL)) t(c1, c2, c3)WHERE falseUNION ALL BY NAMEFROM 'f.parquet';

Result

 c1 | c2 | c3----+----+------ 42 | 43 | NULL

The first FROM clause generates an empty table with three columns where c1 is a VARCHAR. UNION ALL BY NAME then appends the rows from the Parquet file, matching them by column name and leaving c3 as NULL since it is absent there.

For additional options, see the Parquet loading reference.

Adjusting the Schema on the Fly​

Adjusting the Schema on the Fly