Working with CSV Files
SereneDB supports importing data from CSV files, exporting query results to CSV and working with tab-delimited text formats.
Sample data
This tutorial uses the IMDB Movie Dataset — 1,000 movies with titles, genres, descriptions, ratings and more.
Download it:
curl -L -o movies.csv 'https://raw.githubusercontent.com/LearnDataSci/articles/master/Python%20Pandas%20Tutorial%20A%20Complete%20Introduction%20for%20Beginners/IMDB-Movie-Data.csv'
The file has the following columns:
| Column | Type |
|---|---|
| Rank | INTEGER |
| Title | TEXT |
| Genre | TEXT |
| Description | TEXT |
| Director | TEXT |
| Actors | TEXT |
| Year | INTEGER |
| Runtime (Minutes) | INTEGER |
| Rating | FLOAT |
| Votes | INTEGER |
| Revenue (Millions) | FLOAT |
| Metascore | INTEGER |
Importing CSV data
Create a table
Create a table matching the CSV schema:
CREATE TABLE movies (
rank INTEGER,
title TEXT,
genre TEXT,
description TEXT,
director TEXT,
actors TEXT,
year INTEGER,
runtime INTEGER,
rating FLOAT,
votes INTEGER,
revenue FLOAT,
metascore INTEGER
);
Load from CSV
Use COPY FROM to load data. The file path is relative to the server:
COPY movies FROM '/path/to/movies.csv' WITH (FORMAT CSV, HEADER TRUE);
From psql, use \copy to reference a file on the client machine:
\copy movies FROM 'movies.csv' WITH (FORMAT CSV, HEADER TRUE);
Query the data
SELECT title, rating, revenue
FROM movies
WHERE rating > 8.0
ORDER BY revenue DESC
LIMIT 5;
SELECT genre, COUNT(*) AS count, AVG(rating) AS avg_rating
FROM movies
GROUP BY genre
ORDER BY avg_rating DESC
LIMIT 10;
Custom delimiters and null handling
COPY movies FROM '/path/to/movies.csv' WITH (
FORMAT CSV,
DELIMITER '|',
NULL '',
HEADER TRUE
);
Tab-delimited text format
The default TEXT format uses tab as the delimiter:
COPY movies FROM '/path/to/movies.tsv' WITH (FORMAT TEXT, HEADER TRUE);
Importing specific columns
Load only a subset of columns by specifying them explicitly:
COPY movies(title, year) FROM '/path/to/partial.csv' WITH (FORMAT CSV, HEADER TRUE);
Error handling
Skip malformed rows instead of aborting the import:
COPY movies FROM '/path/to/movies.csv' WITH (
FORMAT CSV,
HEADER TRUE,
ON_ERROR IGNORE,
REJECT_LIMIT 10
);
This skips up to 10 bad rows before stopping.
Filtering during import
Use a WHERE clause to load only matching rows:
COPY movies FROM '/path/to/movies.csv' WHERE year > 2010;
Exporting to CSV
Export an entire table:
COPY movies TO '/path/to/output.csv' WITH (FORMAT CSV, HEADER TRUE);
Export a filtered subset:
COPY (SELECT title, year, rating FROM movies WHERE rating > 8.0)
TO '/path/to/top_rated.csv' WITH (FORMAT CSV, HEADER TRUE);
Reading and writing CSV on S3
SereneDB can read and write CSV files directly from S3-compatible storage:
COPY movies FROM 's3://my-bucket/data/movies.csv' WITH (
FORMAT CSV,
HEADER TRUE,
S3_ACCESS_KEY 'your-access-key',
S3_SECRET_KEY 'your-secret-key',
S3_REGION 'us-east-1'
);
COPY movies TO 's3://my-bucket/data/movies.csv' WITH (
FORMAT CSV,
HEADER TRUE,
S3_ACCESS_KEY 'your-access-key',
S3_SECRET_KEY 'your-secret-key',
S3_REGION 'us-east-1'
);
See Working with Parquet — S3 authentication options for the full list of S3 connection and authentication parameters.
Parquet format
SereneDB also supports importing and exporting in Parquet format. See Working with Parquet for details.
COPY movies TO '/path/to/movies.parquet' WITH (FORMAT PARQUET);
COPY movies FROM '/path/to/movies.parquet' WITH (FORMAT PARQUET);