Skip to main content

Encryption

SereneDB supports reading and writing encrypted Parquet files. SereneDB broadly follows the Parquet Modular Encryption specification with some limitations.

Reading and Writing Encrypted Files

Using the PRAGMA add_parquet_key function, named encryption keys of 128, 192, or 256 bits can be added to a session. These keys are stored in-memory:

Query
PRAGMA add_parquet_key('key128', '0123456789112345');
PRAGMA add_parquet_key('key192', '012345678911234501234567');
PRAGMA add_parquet_key('key256', '01234567891123450123456789112345');
PRAGMA add_parquet_key('key256base64', 'MDEyMzQ1Njc4OTExMjM0NTAxMjM0NTY3ODkxMTIzNDU=');
Result
Success
Success
Success
Success

Writing Encrypted Parquet Files

After specifying the key (e.g., key256), files can be encrypted as follows:

Query
COPY tbl TO 'tbl.parquet' (ENCRYPTION_CONFIG {footer_key: 'key256'});

Reading Encrypted Parquet Files

An encrypted Parquet file using a specific key (e.g., key256), can then be read as follows:

Query
COPY tbl FROM 'tbl.parquet' (ENCRYPTION_CONFIG {footer_key: 'key256'});

Or:

Query
SELECT *FROM read_parquet('tbl.parquet', encryption_config = {footer_key: 'key256'});
Result
 id | name  | value----+-------+-------  1 | alpha |    10  2 | beta  |    20

Interoperability

SereneDB can read uniformly encrypted Parquet files written by the Arrow C++ API (e.g., via PyArrow), as long as the same encryption key is used for both the footer and all columns.

Limitations

SereneDB's Parquet encryption currently has the following limitations.

SereneDB encrypts the footer and all columns using the footer_key. The Parquet specification allows encryption of individual columns with different keys, e.g.:

Query
COPY tbl TO 'tbl.parquet'    (ENCRYPTION_CONFIG {        footer_key: 'key256',        column_keys: {key256: ['col0', 'col1']}    });
Result
db error: ERROR: Parquet encryption_config column_keys not yet implemented

However, this is unsupported at the moment and will cause an error to be thrown (for now).

Performance Implications

Note that encryption has some performance implications: reading and writing encrypted Parquet files is slower than reading and writing the unencrypted equivalents.