Skip to main content

Directly Read DuckDB Databases

SereneDB allows directly reading DuckDB files through the read_duckdb function:

Query
FROM read_duckdb('path_to_database', table_name = 'table_to_read');
Result
 r_regionkey | r_name-------------+-------------           0 | AFRICA           1 | AMERICA           2 | ASIA           3 | EUROPE           4 | MIDDLE EAST

Using this function is equivalent to performing the following steps:

  • Attaching to the database using a read-only connection.
  • Querying the table specified through the table_name argument.
  • Closing the connection to the database.

Examples

Reading a Specific Table

To read the region table from the TPC-H dataset, run:

Query
SELECT r_regionkey, r_nameFROM read_duckdb('https://blobs.duckdb.org/data/tpch-sf10.db', table_name = 'region');
Result
 r_regionkey | r_name-------------+-------------           0 | AFRICA           1 | AMERICA           2 | ASIA           3 | EUROPE           4 | MIDDLE EAST

Reading from Multiple Databases

You can use globbing to read from multiple databases. To illustrate this, let's create two tables:

serened shell my-1.duckdb \
-c "CREATE TABLE numbers AS SELECT 42 AS x;" \
-c "CREATE TABLE letters AS SELECT 'm' AS a;"

serened shell my-2.duckdb \
-c "CREATE TABLE numbers AS SELECT 43 AS x;"

Then, in SereneDB, you can run:

Query
SELECT x FROM read_duckdb('my-*.duckdb', table_name = 'numbers');
Result
 x---- 42 43

Reading from Databases with a Single Table

If all databases in read_duckdb's argument have a single table, the table_name argument is optional:

Query
FROM read_duckdb('my-2.duckdb');
Result
 x---- 43

If the extension is .db or .duckdb, you can also omit the read_duckdb call (similarly to how you can omit read_csv and read_parquet):

Query
FROM 'my-2.duckdb';
Result
 my-2------   43

Limitations

read_duckdb currently only supports reading from tables. Reading from views is not yet supported.