Skip to main content

Working with CSV Files

SereneDB supports importing data from CSV files, exporting query results to CSV and working with tab-delimited text formats.

Sample data

This tutorial uses the IMDB Movie Dataset — 1,000 movies with titles, genres, descriptions, ratings and more.

Download it:

curl -L -o movies.csv 'https://raw.githubusercontent.com/LearnDataSci/articles/master/Python%20Pandas%20Tutorial%20A%20Complete%20Introduction%20for%20Beginners/IMDB-Movie-Data.csv'

The file has the following columns:

ColumnType
RankINTEGER
TitleTEXT
GenreTEXT
DescriptionTEXT
DirectorTEXT
ActorsTEXT
YearINTEGER
Runtime (Minutes)INTEGER
RatingFLOAT
VotesINTEGER
Revenue (Millions)FLOAT
MetascoreINTEGER

Importing CSV data

Create a table

Create a table matching the CSV schema:

CREATE TABLE movies (
rank INTEGER,
title TEXT,
genre TEXT,
description TEXT,
director TEXT,
actors TEXT,
year INTEGER,
runtime INTEGER,
rating FLOAT,
votes INTEGER,
revenue FLOAT,
metascore INTEGER
);

Load from CSV

Use COPY FROM to load data. The file path is relative to the server:

COPY movies FROM '/path/to/movies.csv' WITH (FORMAT CSV, HEADER TRUE);

From psql, use \copy to reference a file on the client machine:

\copy movies FROM 'movies.csv' WITH (FORMAT CSV, HEADER TRUE);

Query the data

SELECT title, rating, revenue
FROM movies
WHERE rating > 8.0
ORDER BY revenue DESC
LIMIT 5;
SELECT genre, COUNT(*) AS count, AVG(rating) AS avg_rating
FROM movies
GROUP BY genre
ORDER BY avg_rating DESC
LIMIT 10;

Custom delimiters and null handling

COPY movies FROM '/path/to/movies.csv' WITH (
FORMAT CSV,
DELIMITER '|',
NULL '',
HEADER TRUE
);

Tab-delimited text format

The default TEXT format uses tab as the delimiter:

COPY movies FROM '/path/to/movies.tsv' WITH (FORMAT TEXT, HEADER TRUE);

Importing specific columns

Load only a subset of columns by specifying them explicitly:

COPY movies(title, year) FROM '/path/to/partial.csv' WITH (FORMAT CSV, HEADER TRUE);

Error handling

Skip malformed rows instead of aborting the import:

COPY movies FROM '/path/to/movies.csv' WITH (
FORMAT CSV,
HEADER TRUE,
ON_ERROR IGNORE,
REJECT_LIMIT 10
);

This skips up to 10 bad rows before stopping.

Filtering during import

Use a WHERE clause to load only matching rows:

COPY movies FROM '/path/to/movies.csv' WHERE year > 2010;

Exporting to CSV

Export an entire table:

COPY movies TO '/path/to/output.csv' WITH (FORMAT CSV, HEADER TRUE);

Export a filtered subset:

COPY (SELECT title, year, rating FROM movies WHERE rating > 8.0)
TO '/path/to/top_rated.csv' WITH (FORMAT CSV, HEADER TRUE);

Reading and writing CSV on S3

SereneDB can read and write CSV files directly from S3-compatible storage:

COPY movies FROM 's3://my-bucket/data/movies.csv' WITH (
FORMAT CSV,
HEADER TRUE,
S3_ACCESS_KEY 'your-access-key',
S3_SECRET_KEY 'your-secret-key',
S3_REGION 'us-east-1'
);
COPY movies TO 's3://my-bucket/data/movies.csv' WITH (
FORMAT CSV,
HEADER TRUE,
S3_ACCESS_KEY 'your-access-key',
S3_SECRET_KEY 'your-secret-key',
S3_REGION 'us-east-1'
);

See Working with Parquet — S3 authentication options for the full list of S3 connection and authentication parameters.

Parquet format

SereneDB also supports importing and exporting in Parquet format. See Working with Parquet for details.

COPY movies TO '/path/to/movies.parquet' WITH (FORMAT PARQUET);
COPY movies FROM '/path/to/movies.parquet' WITH (FORMAT PARQUET);