Edit this page

Text Functions

Text Functions and Operators

This section describes functions and operators for examining and manipulating STRING values.

Function	Description
`string[index]`	Extracts a single character using a (1-based) `index`.
`string[begin:end]`	Extracts a string using slice conventions similar to Python. Missing `begin` or `end` arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted.
`string LIKE target`	Returns `true` if the `string` matches the like specifier (see Pattern Matching).
`string SIMILAR TO regex`	Returns `true` if the `string` matches the `regex` (see Pattern Matching).
`string ^@ search_string`	Alias for `starts_with`.
`arg1 \|\| arg2`	Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also `concat(arg1, arg2, ...)` and `list_concat(list1, list2, ...)`.
`array_extract(string, index)`	Extracts a single character from a `string` using a (1-based) `index`.
`array_slice(list, begin, end)`	Extracts a sublist or substring using slice conventions. Negative values are accepted.
`ascii(string)`	Returns an integer that represents the Unicode code point of the first character of the `string`.
`bar(x, min, max[, width])`	Draws a band whose width is proportional to (`x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80.
`base64(blob)`	Alias for `to_base64`.
`bin(string)`	Converts the `string` to binary representation.
`bit_length(string)`	Number of bits in a `string`.
`char_length(string)`	Alias for `length`.
`character_length(string)`	Alias for `length`.
`chr(code_point)`	Returns a character which is corresponding the ASCII code value or Unicode code point.
`concat(value, ...)`	Concatenates multiple strings or lists. `NULL` inputs are skipped. See also operator `\|\|`.
`concat_ws(separator, string, ...)`	Concatenates many strings, separated by `separator`. `NULL` inputs are skipped.
`contains(string, search_string)`	Returns `true` if `search_string` is found within `string`. Note that collations are not supported.
`ends_with(string, search_string)`	Alias for `suffix`.
`format(format, ...)`	Formats a string using PostgreSQL `format` specifiers (`%s`, `%I`, `%L`).
`formatReadableDecimalSize(integer)`	Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.).
`format_bytes(integer)`	Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.).
`from_base64(string)`	Converts a base64 encoded `string` to a character string (`BLOB`).
`from_binary(value)`	Alias for `unbin`.
`from_hex(value)`	Alias for `unhex`.
`greatest(arg1, ...)`	Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and collations are not supported.
`hash(value, ...)`	Returns a `UBIGINT` with the hash of the `value`. Note that this is not a cryptographic hash.
`hex(string)`	Converts the `string` to hexadecimal representation.
`ilike_escape(string, like_specifier, escape_character)`	Returns `true` if the `string` matches the `like_specifier` (see Pattern Matching) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
`instr(string, search_string)`	Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found.
`lcase(string)`	Alias for `lower`.
`least(arg1, ...)`	Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and collations are not supported.
`left(string, count)`	Extracts the left-most count characters.
`left_grapheme(string, count)`	Extracts the left-most count grapheme clusters.
`len(string)`	Alias for `length`.
`length(string)`	Number of characters in `string`.
`length_grapheme(string)`	Number of grapheme clusters in `string`.
`like_escape(string, like_specifier, escape_character)`	Returns `true` if the `string` matches the `like_specifier` (see Pattern Matching) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
`lower(string)`	Converts `string` to lower case.
`lpad(string, count, character)`	Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters.
`ltrim(string[, characters])`	Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`.
`md5(string)`	Returns the MD5 hash of the `string` as a `VARCHAR`.
`md5_number(string)`	Returns the MD5 hash of the `string` as a `HUGEINT`.
`md5_number_lower(string)`	Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`.
`md5_number_upper(string)`	Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`.
`nfc_normalize(string)`	Converts `string` to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not.
`not_ilike_escape(string, like_specifier, escape_character)`	Returns `false` if the `string` matches the `like_specifier` (see Pattern Matching) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
`not_like_escape(string, like_specifier, escape_character)`	Returns `false` if the `string` matches the `like_specifier` (see Pattern Matching) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`.
`ord(string)`	Alias for `unicode`.
`parse_dirname(path[, separator])`	Returns the top-level directory name from the given `path`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
`parse_dirpath(path[, separator])`	Returns the head of the `path` (the pathname until the last slash) similarly to Python's `os.path.dirname`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
`parse_filename(string[, trim_extension][, separator])`	Returns the last component of the `path` similarly to Python's `os.path.basename` function. If `trim_extension` is `true`, the file extension will be removed (defaults to `false`). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
`parse_path(path[, separator])`	Returns a list of the components (directories and filename) in the `path` similarly to Python's `pathlib.parts` function. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`.
`position(search_string IN string)`	Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found.
`position(string, search_string)`	Alias for `instr`.
`prefix(string, search_string)`	Returns `true` if `string` starts with `search_string`.
`printf(format, ...)`	Formats a `string` using printf syntax.
`read_text(source)`	Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the `read_text` guide for more details.
`regexp_escape(string)`	Escapes special patterns to turn `string` into a regular expression similarly to Python's `re.escape` function.
`regexp_extract(string, regex[, group][, options])`	If `string` contains the `regex` pattern, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional regex `options` can be set.
`regexp_extract(string, regex, name_list[, options])`	If `string` contains the `regex` pattern, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional regex `options` can be set.
`regexp_extract_all(string, regex[, group][, options])`	Finds non-overlapping occurrences of the `regex` in the `string` and returns the corresponding values of the capturing `group`. A set of optional regex `options` can be set.
`regexp_extract_all(string, regex, name_list[, options])`	Finds non-overlapping occurrences of `regex` in `string` and returns the capturing groups as a list of structs with corresponding names from `name_list`. A set of optional regex `options` can be set.
`regexp_full_match(string, regex[, col2])`	Returns `true` if the entire `string` matches the `regex`. A set of optional regex `options` can be set.
`regexp_matches(string, regex[, options])`	Returns `true` if `string` contains the `regex`, `false` otherwise. A set of optional regex `options` can be set.
`regexp_replace(string, regex, replacement[, options])`	If `string` contains the `regex`, replaces the matching part with `replacement`. A set of optional regex `options` can be set.
`regexp_split_to_array(string, regex[, options])`	Alias for `string_split_regex`.
`regexp_split_to_table(string, regex)`	Splits the `string` along the `regex` and returns a row for each part.
`repeat(string, count)`	Repeats the `string` `count` number of times.
`replace(string, source, target)`	Replaces any occurrences of the `source` with `target` in `string`.
`reverse(string)`	Reverses the `string`.
`right(string, count)`	Extract the right-most `count` characters.
`right_grapheme(string, count)`	Extracts the right-most `count` grapheme clusters.
`rpad(string, count, character)`	Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters.
`rtrim(string[, characters])`	Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`.
`sha1(value)`	Returns a `VARCHAR` with the SHA-1 hash of the `value`.
`sha256(value)`	Returns a `VARCHAR` with the SHA-256 hash of the `value`.
`split(string, separator)`	Alias for `string_split`.
`split_part(string, separator, index)`	Splits the `string` along the `separator` and returns the data at the (1-based) `index` of the list. If the `index` is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior).
`starts_with(string, search_string)`	Returns `true` if `string` begins with `search_string`.
`str_split(string, separator)`	Alias for `string_split`.
`str_split_regex(string, regex[, options])`	Alias for `string_split_regex`.
`string_split(string, separator)`	Splits the `string` along the `separator`.
`string_split_regex(string, regex[, options])`	Splits the `string` along the `regex`. A set of optional regex `options` can be set.
`string_to_array(string, separator)`	Alias for `string_split`.
`strip_accents(string)`	Strips accents from `string`.
`strlen(string)`	Number of bytes in `string`.
`strpos(string, search_string)`	Alias for `instr`.
`substr(string, start[, length])`	Alias for `substring`.
`substring(string, start[, length])`	Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`.
`substring_grapheme(string, start[, length])`	Extracts substring starting from grapheme clusters `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` grapheme clusters instead. Note that a `start` value of `1` refers to the `first` character of the `string`.
`suffix(string, search_string)`	Returns `true` if `string` ends with `search_string`. Note that collations are not supported.
`to_base(number, radix[, min_length])`	Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`.
`to_base64(blob)`	Converts a `blob` to a base64 encoded string.
`to_binary(string)`	Alias for `bin`.
`to_hex(string)`	Alias for `hex`.
`translate(string, from, to)`	Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted.
`trim(string[, characters])`	Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`.
`ucase(string)`	Alias for `upper`.
`unbin(value)`	Converts a `value` from binary representation to a blob.
`unhex(value)`	Converts a `value` from hexadecimal representation to a blob.
`unicode(string)`	Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`.
`upper(string)`	Converts `string` to upper case.
`url_decode(string)`	Decodes a URL from a representation using Percent-Encoding.
`url_encode(string)`	Encodes a URL to a representation using Percent-Encoding.

`string[index]`

Extracts a single character using a (1-based) index. Alias: array_extract.

Query

SELECT 'SereneDB'[4] AS char;

Result

 char------ e

`string[begin:end]`

Extracts a string using slice conventions similar to Python. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. Alias: array_slice.

Query

SELECT 'SereneDB'[:4] AS slice;

Result

 slice------- Sere

`string LIKE target`

Returns true if the string matches the like specifier (see Pattern Matching).

Query

SELECT 'hello' LIKE '%lo' AS matches;

Result

 matches--------- t

`string SIMILAR TO regex`

Returns true if the string matches the regex (see Pattern Matching). Alias: regexp_full_match.

Query

SELECT 'hello' SIMILAR TO 'l+' AS matches;

Result

 matches--------- f

`arg1 || arg2`

Concatenates two strings, lists, or blobs. Any NULL input results in NULL. See also concat(arg1, arg2, ...) and list_concat(list1, list2, ...).

Query

SELECT 'Serene' || 'DB' AS strings;
SELECT [1, 2, 3] || [4, 5, 6] AS lists;
SELECT '\xAA'::BLOB || '\xBB'::BLOB AS blobs;

Result

 strings---------- SereneDB
 lists--------------- {1,2,3,4,5,6}
 blobs--------- \\xaabb

`array_extract(string, index)`

Extracts a single character from a string using a (1-based) index.

Query

SELECT array_extract('SereneDB', 2) AS char;

Result

 char------ e

`array_slice(list, begin, end)`

Extracts a sublist or substring using slice conventions. Negative values are accepted. Alias: list_slice.

Query

SELECT array_slice('SereneDB', 3, 4) AS slice;
SELECT array_slice('SereneDB', 3, NULL) AS slice;
SELECT array_slice('SereneDB', 0, -3) AS slice;

Result

 slice------- re
 slice------- NULL
 slice-------- Serene

`ascii(string)`

Returns an integer that represents the Unicode code point of the first character of the string.

Query

SELECT ascii('Ω') AS code_point;

Result

 code_point------------        937

`bar(x, min, max[, width])`

Draws a band whose width is proportional to (x - min) and equal to width characters when x = max. width defaults to 80.

Query

SELECT bar(5, 0, 20, 10) AS band;

Result

 band------ ██▌

`bin(string)`

Converts the string to binary representation. Alias: to_binary.

Query

SELECT bin('Aa') AS binary;

Result

 binary------------------ 0100000101100001

`bit_length(string)`

Number of bits in a string.

Query

SELECT bit_length('abc') AS bits;

Result

 bits------   24

`chr(code_point)`

Returns a character which is corresponding the ASCII code value or Unicode code point.

Query

SELECT chr(65) AS char;

Result

 char------ A

`concat(value, ...)`

Concatenates multiple strings or lists. NULL inputs are skipped. See also operator ||.

Query

SELECT concat('Hello', ' ', 'World') AS result;
SELECT concat([1, 2, 3], NULL, [4, 5, 6]) AS result;

Result

 result------------- Hello World
 result--------------- {1,2,3,4,5,6}

`concat_ws(separator, string, ...)`

Concatenates many strings, separated by separator. NULL inputs are skipped.

Query

SELECT concat_ws(', ', 'Banana', 'Apple', 'Melon') AS result;

Result

 result---------------------- Banana, Apple, Melon

`contains(string, search_string)`

Returns true if search_string is found within string.

Query

SELECT contains('abc', 'a') AS result;

Result

 result-------- t

`format(format, ...)`

Formats a string using PostgreSQL-style format specifiers (%s, %I, %L, %%).

Query

SELECT format('Hello %s, you are %s', 'World', 42) AS greeting;

Result

 greeting------------------------- Hello World, you are 42

`formatReadableDecimalSize(integer)`

Converts integer to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.).

Query

SELECT formatReadableDecimalSize(16000) AS result;

Result

 result--------- 16.0 kB

`format_bytes(integer)`

Converts integer to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). Aliases: formatReadableSize, pg_size_pretty.

Query

SELECT format_bytes(16_000) AS result;

Result

 result---------- 15.6 KiB

`from_base64(string)`

Converts a base64 encoded string to a character string (BLOB).

Query

SELECT from_base64('QQ==') AS result;

Result

 result-------- \\x41

`greatest(arg1, ...)`

Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and collations are not supported.

Query

SELECT greatest(42, 84) AS result;
SELECT greatest('abc', 'bcd', 'cde', 'EFG') AS result;

Result

 result--------     84
 result-------- cde

`hash(value, ...)`

Returns a UBIGINT with the hash of the value. Note that this is not a cryptographic hash.

Query

SELECT hash('🌊') AS result;

Result

 result---------------------- 12255790945742965276

`hex(string)`

Converts the string to hexadecimal representation. Alias: to_hex.

Query

SELECT hex('Hello') AS result;

Result

 result------------ 48656C6C6F

`ilike_escape(string, like_specifier, escape_character)`

Returns true if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string.

Query

SELECT ilike_escape('A%c', 'a$%C', '$') AS result;

Result

 result-------- t

`instr(string, search_string)`

Returns location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found. Aliases: position, strpos.

Query

SELECT instr('test test', 'es') AS result;

Result

 result--------      2

`least(arg1, ...)`

Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and collations are not supported.

Query

SELECT least(42, 84) AS result;
SELECT least('abc', 'bcd', 'cde', 'EFG') AS result;

Result

 result--------     42
 result-------- EFG

`left(string, count)`

Extracts the left-most count characters.

Query

SELECT left('Hello🌊', 2) AS result;

Result

 result-------- He

`left_grapheme(string, count)`

Extracts the left-most count grapheme clusters.

Query

SELECT left_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1) AS result;

Result

 result--------- 🤦🏼‍♂️

`length(string)`

Number of characters in string. Aliases: char_length, character_length, len.

Query

SELECT length('Hello🌊') AS result;

Result

 result--------      6

`length_grapheme(string)`

Number of grapheme clusters in string.

Query

SELECT length_grapheme('🤦🏼‍♂️🤦🏽‍♀️') AS result;

Result

 result--------      2

`like_escape(string, like_specifier, escape_character)`

Returns true if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string.

Query

SELECT like_escape('a%c', 'a$%c', '$') AS result;

Result

 result-------- t

`lower(string)`

Converts string to lower case. Alias: lcase.

Query

SELECT lower('Hello') AS result;

Result

 result-------- hello

`lpad(string, count, character)`

Pads the string with the character on the left until it has count characters. Truncates the string on the right if it has more than count characters.

Query

SELECT lpad('hello', 8, '>') AS result;

Result

 result---------- >>>hello

`ltrim(string[, characters])`

Removes any occurrences of any of the characters from the left side of the string. characters defaults to space.

Query

SELECT ltrim('    test  ') AS result;
SELECT ltrim('>>>>test<<', '><') AS result;

Result

 result-------- test
 result-------- test<<

`md5(string)`

Returns the MD5 hash of the string as a VARCHAR.

Query

SELECT md5('abc') AS result;

Result

 result---------------------------------- 900150983cd24fb0d6963f7d28e17f72

`md5_number(string)`

Returns the MD5 hash of the string as a HUGEINT.

Query

`md5_number_lower(string)`

Returns the lower 64-bit segment of the MD5 hash of the string as a UBIGINT.

Query

SELECT md5_number_lower('abc') AS result;

Result

 result--------------------- 8250560606382298838

`md5_number_upper(string)`

Returns the upper 64-bit segment of the MD5 hash of the string as a UBIGINT.

Query

SELECT md5_number_upper('abc') AS result;

Result

 result---------------------- 12704604231530709392

`nfc_normalize(string)`

Converts string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not.

Query

SELECT nfc_normalize('ardèch') AS result;

Result

 result-------- ardèch

`not_ilike_escape(string, like_specifier, escape_character)`

Returns false if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string.

Query

SELECT not_ilike_escape('A%c', 'a$%C', '$') AS result;

Result

 result-------- f

`not_like_escape(string, like_specifier, escape_character)`

Returns false if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string.

Query

SELECT not_like_escape('a%c', 'a$%c', '$') AS result;

Result

 result-------- f

`parse_dirname(path[, separator])`

Returns the top-level directory name from the given path. separator options: system, both_slash (default), forward_slash, backslash.

Query

SELECT parse_dirname('path/to/file.csv', 'system') AS result;

Result

 result-------- path

`parse_dirpath(path[, separator])`

Returns the head of the path (the pathname until the last slash) similarly to Python's os.path.dirname. separator options: system, both_slash (default), forward_slash, backslash.

Query

SELECT parse_dirpath('path/to/file.csv', 'forward_slash') AS result;

Result

 result--------- path/to

`parse_filename(string[, trim_extension][, separator])`

Returns the last component of the path similarly to Python's os.path.basename function. If trim_extension is true, the file extension will be removed (defaults to false). separator options: system, both_slash (default), forward_slash, backslash.

Query

SELECT parse_filename('path/to/file.csv', true, 'forward_slash') AS result;

Result

 result-------- file

`parse_path(path[, separator])`

Returns a list of the components (directories and filename) in the path similarly to Python's pathlib.parts function. separator options: system, both_slash (default), forward_slash, backslash.

Query

SELECT parse_path('path/to/file.csv', 'system') AS result;

Result

 result-------------------- {path,to,file.csv}

`position(search_string IN string)`

Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found. Aliases: instr, strpos.

Query

SELECT position('b' IN 'abc') AS result;

Result

 result--------      2

`prefix(string, search_string)`

Returns true if string starts with search_string.

Query

SELECT prefix('abc', 'ab') AS result;

Result

 result-------- t

`printf(format, ...)`

Formats a string using printf syntax.

Query

SELECT printf('Benchmark "%s" took %d seconds', 'CSV', 42) AS result;

Result

 result--------------------------------- Benchmark "CSV" took 42 seconds

`read_text(source)`

A table function that returns the content from source (a filename, a list of filenames, or a glob pattern); query it with FROM. It yields filename, content, size and last_modified columns, with content as a VARCHAR. The file content is first validated to be valid UTF-8. If read_text attempts to read a file with invalid UTF-8 an error is thrown suggesting to use read_blob instead. See the read_text guide for more details.

Query

`regexp_escape(string)`

Escapes special patterns to turn string into a regular expression similarly to Python's re.escape function.

Query

SELECT regexp_escape('https://serenedb.com') AS result;

Result

 result------------------------------ https\\:\\/\\/serenedb\\.com

`regexp_extract(string, regex[, group][, options])`

If string contains the regex pattern, returns the capturing group specified by optional parameter group; otherwise, returns the empty string. The group must be a constant value. If no group is given, it defaults to 0. A set of optional regex options can be set.

Query

SELECT regexp_extract('ABC', '([a-z])(b)', 1, 'i') AS result;

Result

 result-------- A

`regexp_extract(string, regex, name_list[, options])`

If string contains the regex pattern, returns the capturing groups as a struct with corresponding names from name_list; otherwise, returns a struct with the same keys and empty strings as values. A set of optional regex options can be set.

Query

SELECT regexp_extract('John Doe', '([a-z]+) ([a-z]+)', ['first_name', 'last_name'], 'i') AS result;

Result

 result------------ (John,Doe)

`regexp_extract_all(string, regex[, group][, options])`

Finds non-overlapping occurrences of the regex in the string and returns the corresponding values of the capturing group. A set of optional regex options can be set.

Query

SELECT regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2) AS result;

Result

 result--------- {33,14}

`regexp_extract_all(string, regex, name_list[, options])`

Finds non-overlapping occurrences of regex in string and returns the capturing groups as a list of structs with corresponding names from name_list. A set of optional regex options can be set.

Query

SELECT regexp_extract_all('Peter: 33, Paul: 14', '(\w+):\s*(\d+)', ['name', 'age']) AS result;

Result

 result---------------------------- {"(Peter,33)","(Paul,14)"}

`regexp_full_match(string, regex[, col2])`

Returns true if the entire string matches the regex. A set of optional regex options can be set.

Query

SELECT regexp_full_match('anabanana', '(an)*') AS result;

Result

 result-------- f

`regexp_matches(string, regex[, options])`

Returns true if string contains the regex, false otherwise. A set of optional regex options can be set.

Query

SELECT regexp_matches('anabanana', '(an)*') AS result;

Result

 result-------- t

`regexp_replace(string, regex, replacement[, options])`

If string contains the regex, replaces the matching part with replacement. A set of optional regex options can be set.

Query

SELECT regexp_replace('hello', '[lo]', '-') AS result;

Result

 result-------- he-lo

`regexp_split_to_table(string, regex)`

Splits the string along the regex and returns a row for each part.

Query

SELECT * FROM regexp_split_to_table('hello world; 42', ';? ');

Result

 regexp_split_to_table----------------------- hello world 42

`repeat(string, count)`

Repeats the string count number of times.

Query

SELECT repeat('A', 5) AS result;

Result

 result-------- AAAAA

`replace(string, source, target)`

Replaces any occurrences of the source with target in string.

Query

SELECT replace('hello', 'l', '-') AS result;

Result

 result-------- he--o

`reverse(string)`

Reverses the string.

Query

SELECT reverse('hello') AS result;

Result

 result-------- olleh

`right(string, count)`

Extract the right-most count characters.

Query

SELECT right('Hello🌊', 3) AS result;

Result

 result-------- lo🌊

`right_grapheme(string, count)`

Extracts the right-most count grapheme clusters.

Query

SELECT right_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1) AS result;

Result

 result--------- 🤦🏽‍♀️

`rpad(string, count, character)`

Pads the string with the character on the right until it has count characters. Truncates the string on the right if it has more than count characters.

Query

SELECT rpad('hello', 10, '<') AS result;

Result

 result------------ hello<<<<<

`rtrim(string[, characters])`

Removes any occurrences of any of the characters from the right side of the string. characters defaults to space.

Query

SELECT rtrim('    test  ') AS result;
SELECT rtrim('>>>>test<<', '><') AS result;

Result

 result----------     test
 result---------- >>>>test

`sha1(value)`

Returns a VARCHAR with the SHA-1 hash of the value.

Query

SELECT sha1('🌊') AS result;

Result

 result------------------------------------------ be25fe9cf30cf5c01ae2054b60a5e5511e966d98

`sha256(value)`

Returns a VARCHAR with the SHA-256 hash of the value

Query

SELECT sha256('🌊') AS result;

Result

 result------------------------------------------------------------------ 8db6bed4743d5a6159e0742df0fded8377d1250420aeda292e41ffa20eba806e

`split_part(string, separator, index)`

Splits the string along the separator and returns the data at the (1-based) index of the list. If the index is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior).

Query

SELECT split_part('a;b;c', ';', 2) AS result;

Result

 result-------- b

`starts_with(string, search_string)`

Returns true if string begins with search_string. Alias: ^@.

Query

SELECT starts_with('abc', 'a') AS result;

Result

 result-------- t

`string_split(string, separator)`

Splits the string along the separator. Aliases: split, str_split, string_to_array.

Query

SELECT string_split('hello-world', '-') AS result;

Result

 result--------------- {hello,world}

`string_split_regex(string, regex[, options])`

Splits the string along the regex. A set of optional regex options can be set. Aliases: regexp_split_to_array, str_split_regex.

Query

SELECT string_split_regex('hello world; 42', ';? ') AS result;

Result

 result------------------ {hello,world,42}

`strip_accents(string)`

Strips accents from string.

Query

SELECT strip_accents('mühleisen') AS result;

Result

 result----------- muhleisen

`strlen(string)`

Number of bytes in string.

Query

SELECT strlen('🌊') AS result;

Result

 result--------      4

`substring(string, start[, length])`

Extracts substring starting from character start up to the end of the string. If optional argument length is set, extracts a substring of length characters instead. Note that a start value of 1 refers to the first character of the string. Alias: substr.

Query

SELECT substring('Hello', 2) AS result;
SELECT substring('Hello', 2, 2) AS result;

Result

 result-------- ello
 result-------- el

`substring_grapheme(string, start[, length])`

Extracts substring starting from grapheme clusters start up to the end of the string. If optional argument length is set, extracts a substring of length grapheme clusters instead. Note that a start value of 1 refers to the first character of the string.

Query

SELECT substring_grapheme('🌊🤦🏼‍♂️🤦🏽‍♀️🌊', 3) AS result;
SELECT substring_grapheme('🌊🤦🏼‍♂️🤦🏽‍♀️🌊', 3, 2) AS result;

Result

 result----------- 🤦🏽‍♀️🌊
 result----------- 🤦🏽‍♀️🌊

`suffix(string, search_string)`

Returns true if string ends with search_string. Note that collations are not supported. Alias: ends_with.

Query

SELECT suffix('abc', 'bc') AS result;

Result

 result-------- t

`to_base(number, radix[, min_length])`

Converts number to a string in the given base radix, optionally padding with leading zeros to min_length.

Query

SELECT to_base(42, 16, 5) AS result;

Result

 result-------- 0002A

`to_base64(blob)`

Converts a blob to a base64 encoded string. Alias: base64.

Query

SELECT to_base64('A'::BLOB) AS result;

Result

 result-------- QQ==

`translate(string, from, to)`

Replaces each character in string that matches a character in the from set with the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are deleted.

Query

SELECT translate('12345', '143', 'ax') AS result;

Result

 result-------- a2x5

`trim(string[, characters])`

Removes any occurrences of any of the characters from either side of the string. characters defaults to space.

Query

SELECT trim('    test  ') AS result;
SELECT trim('>>>>test<<', '><') AS result;

Result

 result-------- test
 result-------- test

`unbin(value)`

Converts a value from binary representation to a blob. Alias: from_binary.

Query

SELECT unbin('0110') AS result;

Result

 result-------- \\x06

`unhex(value)`

Converts a value from hexadecimal representation to a blob. Alias: from_hex.

Query

SELECT unhex('2A') AS result;

Result

 result-------- \\x2a

`unicode(string)`

Returns an INTEGER representing the unicode codepoint of the first character in the string. Alias: ord.

Query

SELECT [unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)] AS result;

Result

 result------------------- {226,226,-1,NULL}

`upper(string)`

Converts string to upper case. Alias: ucase.

Query

SELECT upper('Hello') AS result;

Result

 result-------- HELLO

`url_decode(string)`

Decodes a URL from a representation using Percent-Encoding.

Query

SELECT url_decode('https%3A%2F%2Fserenedb.com%2Fwhy_serenedb%23portable') AS result;

Result

 result-------------------------------------------- https://serenedb.com/why_serenedb#portable

`url_encode(string)`

Encodes a URL to a representation using Percent-Encoding.

Query

SELECT url_encode('this string has/ special+ characters>') AS result;

Result

 result----------------------------------------------------- this%20string%20has%2F%20special%2B%20characters%3E

Text Similarity Functions

These functions are used to measure the similarity of two strings using various similarity measures.

Function	Description
`damerau_levenshtein(s1, s2)`	Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., `a` and `A`) are considered different.
`editdist3(s1, s2)`	Alias for `levenshtein`.
`hamming(s1, s2)`	The Hamming distance between two strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different.
`jaccard(s1, s2)`	The Jaccard similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1.
`jaro_similarity(s1, s2[, score_cutoff])`	The Jaro similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0.
`jaro_winkler_similarity(s1, s2[, score_cutoff])`	The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0.
`levenshtein(s1, s2)`	The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different.
`mismatches(s1, s2)`	Alias for `hamming`.

`damerau_levenshtein(s1, s2)`

Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., a and A) are considered different.

Query

SELECT damerau_levenshtein('serene', 'esrnee') AS result;

Result

 result--------      2

`hamming(s1, s2)`

The Hamming distance between two strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., a and A) are considered different. Alias: mismatches.

Query

SELECT hamming('serene', 'serine') AS result;

Result

 result--------      1

`jaccard(s1, s2)`

The Jaccard similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1.

Query

SELECT jaccard('serene', 'serine') AS result;

Result

 result--------    0.8

`jaro_similarity(s1, s2[, score_cutoff])`

The Jaro similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff, 0 is returned instead. score_cutoff defaults to 0.

Query

SELECT jaro_similarity('serene', 'serenedb') AS result;

Result

 result-------------------- 0.9166666666666666

`jaro_winkler_similarity(s1, s2[, score_cutoff])`

The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff, 0 is returned instead. score_cutoff defaults to 0.

Query

SELECT jaro_winkler_similarity('serene', 'serenedb') AS result;

Result

 result--------   0.95

`levenshtein(s1, s2)`

The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., a and A) are considered different. Alias: editdist3.

Query

SELECT levenshtein('serene', 'serenedb') AS result;

Result

 result--------      2

Formatters

`format` Syntax

The format(format, ...) function formats a string using PostgreSQL-style format specifiers — each specifier in format is replaced by the corresponding argument. (This is PostgreSQL's format, not the {}-placeholder syntax of the {fmt} library.)

A specifier has the form %[position$][-][width]type, where type is one of:

Type	Description
`%s`	The argument as a string.
`%I`	The argument as a quoted SQL identifier (double-quoted when needed).
`%L`	The argument as a quoted SQL literal (single-quoted, embedded quotes escaped; `NULL` becomes `NULL`).
`%%`	A literal `%` (consumes no argument).

Only s, I and L are valid types — any other (e.g. %d) raises unrecognized format() type specifier.

Basic substitution with %s:

Query

SELECT format('Hello %s, you are %s', 'World', 42) AS greeting;

Result

 greeting------------------------- Hello World, you are 42

%I and %L safely quote identifiers and literals — useful for building SQL dynamically:

Query

SELECT format('INSERT INTO %I VALUES (%L)', 'my table', 'a''b') AS stmt;

Result

 stmt---------------------------------------- INSERT INTO "my table" VALUES ('a''b')

Arguments are consumed in order; an explicit position can be given as %n$ (e.g. %1$s), which also lets you reuse and reorder them:

Query

SELECT format('%1$s %1$s %2$s', 'a', 'b') AS repeated;

Result

 repeated---------- a a b

An optional minimum field width right-justifies the value (left-justifies with a leading -), padding with spaces:

Query

SELECT format('|%-10s|%10s|', 'left', 'right') AS padded;

Result

 padded------------------------- |left      |     right|

`printf` Syntax

The printf(format, parameters...) function formats strings using the printf syntax.

Format without additional parameters:

Query

SELECT printf('Hello world');

Result

 printf------------- Hello world

Format a string using arguments in a given order:

Query

SELECT printf('The answer to %s is %d', 'life', 42);

Result

 printf-------------------------- The answer to life is 42

Format a string using positional arguments %position$formatter, e.g., the second parameter as a string is encoded as %2$s:

Query

SELECT printf('I''d rather be %2$s than %1$s.', 'right', 'happy');

Result

 printf--------------------------------- I'd rather be happy than right.

Format Specifiers

Specifier	Description	Example
`%c`	character code to character	`a`
`%d`	integer	`654321`
`%E`	scientific notation	`3.141593E+00`
`%f`	float	`4.560000`
`%hd`	integer truncated to a C `short`	`-1039`
`%hhd`	integer truncated to a C `signed char`	`-15`
`%lld`	integer	`654321`
`%o`	octal	`2375761`
`%s`	string	`asd`
`%x`	hexadecimal	`9fbf1`

Formatting Types

Integers:

Query

SELECT printf('%d + %d = %d', 3, 5, 3 + 5);

Result

 printf----------- 3 + 5 = 8

Booleans are not accepted as %s arguments and raise invalid format specifier; cast them to text first:

Query

SELECT printf('%s != %s', true, false);

Result

db error: ERROR: invalid format specifier

Format datetime values:

Query

SELECT printf('%s', DATE '1992-01-01');
SELECT printf('%s', TIME '12:01:00');
SELECT printf('%s', TIMESTAMP '1992-01-01 12:01:00');

Result

 printf------------ 1992-01-01
 printf---------- 12:01:00
 printf--------------------- 1992-01-01 12:01:00

Format a BLOB (rendered as a hex string):

Query

SELECT printf('%s', 'hello'::BLOB) AS printf;

Result

 printf--------------- \\x68656c6c6f

Pad integers with 0s:

Query

SELECT printf('%04d', 33);

Result

 printf--------   0033

Create timestamps from integers:

Query

SELECT printf('%02d:%02d:%02d %s', 12, 3, 16, 'AM');

Result

 printf------------- 12:03:16 AM

Convert to hexadecimal:

Query

SELECT printf('%x', 123_456_789);

Result

 printf--------- 75bcd15

There is no binary (%b) specifier; it raises invalid format specifier. Use bin to obtain a binary representation:

Query

SELECT printf('%b', 123_456_789);

Result

db error: ERROR: invalid format specifier

Thousand Separators

printf does not support thousand-separator specifiers (such as %,d, %_d or %''d); they raise invalid format specifier:

Query

SELECT printf('%,d',  123_456_789);
SELECT printf('%.d',  123_456_789);
SELECT printf('%''d', 123_456_789);
SELECT printf('%_d',  123_456_789);

Result

db error: ERROR: invalid format specifier
 printf----------- 123456789
db error: ERROR: invalid format specifier
db error: ERROR: invalid format specifier

This also applies to floats and doubles:

Query

SELECT printf('%,f',   123456.789);
SELECT printf('%,.2f', 123456.789);

Result

db error: ERROR: invalid format specifier
db error: ERROR: invalid format specifier

Text Functions and Operators​

string[index]​

string[begin:end]​

string LIKE target​

string SIMILAR TO regex​

arg1 || arg2​

array_extract(string, index)​

array_slice(list, begin, end)​

ascii(string)​

bar(x, min, max[, width])​

bin(string)​

bit_length(string)​

chr(code_point)​

concat(value, ...)​

concat_ws(separator, string, ...)​

contains(string, search_string)​

format(format, ...)​

formatReadableDecimalSize(integer)​

format_bytes(integer)​

from_base64(string)​

greatest(arg1, ...)​

hash(value, ...)​

hex(string)​

ilike_escape(string, like_specifier, escape_character)​

instr(string, search_string)​

least(arg1, ...)​

left(string, count)​

left_grapheme(string, count)​

length(string)​

length_grapheme(string)​

like_escape(string, like_specifier, escape_character)​

lower(string)​

lpad(string, count, character)​

ltrim(string[, characters])​

md5(string)​

md5_number(string)​

md5_number_lower(string)​

md5_number_upper(string)​

nfc_normalize(string)​

not_ilike_escape(string, like_specifier, escape_character)​

not_like_escape(string, like_specifier, escape_character)​

parse_dirname(path[, separator])​

parse_dirpath(path[, separator])​

parse_filename(string[, trim_extension][, separator])​

parse_path(path[, separator])​

position(search_string IN string)​

prefix(string, search_string)​

printf(format, ...)​

read_text(source)​

regexp_escape(string)​

regexp_extract(string, regex[, group][, options])​

regexp_extract(string, regex, name_list[, options])​

regexp_extract_all(string, regex[, group][, options])​

regexp_extract_all(string, regex, name_list[, options])​

regexp_full_match(string, regex[, col2])​

regexp_matches(string, regex[, options])​

regexp_replace(string, regex, replacement[, options])​

regexp_split_to_table(string, regex)​

repeat(string, count)​

replace(string, source, target)​

reverse(string)​

right(string, count)​

right_grapheme(string, count)​

rpad(string, count, character)​

rtrim(string[, characters])​

sha1(value)​

sha256(value)​

split_part(string, separator, index)​

starts_with(string, search_string)​

string_split(string, separator)​

string_split_regex(string, regex[, options])​

strip_accents(string)​

strlen(string)​

substring(string, start[, length])​

substring_grapheme(string, start[, length])​

suffix(string, search_string)​

to_base(number, radix[, min_length])​

to_base64(blob)​

translate(string, from, to)​

trim(string[, characters])​

Text Functions and Operators

`string[index]`

`string[begin:end]`

`string LIKE target`

`string SIMILAR TO regex`

`arg1 || arg2`

`array_extract(string, index)`

`array_slice(list, begin, end)`

`ascii(string)`

`bar(x, min, max[, width])`

`bin(string)`

`bit_length(string)`

`chr(code_point)`

`concat(value, ...)`

`concat_ws(separator, string, ...)`

`contains(string, search_string)`

`format(format, ...)`

`formatReadableDecimalSize(integer)`

`format_bytes(integer)`

`from_base64(string)`

`greatest(arg1, ...)`

`hash(value, ...)`

`hex(string)`

`ilike_escape(string, like_specifier, escape_character)`

`instr(string, search_string)`

`least(arg1, ...)`

`left(string, count)`

`left_grapheme(string, count)`

`length(string)`

`length_grapheme(string)`

`like_escape(string, like_specifier, escape_character)`

`lower(string)`

`lpad(string, count, character)`

`ltrim(string[, characters])`

`md5(string)`

`md5_number(string)`

`md5_number_lower(string)`

`md5_number_upper(string)`

`nfc_normalize(string)`

`not_ilike_escape(string, like_specifier, escape_character)`

`not_like_escape(string, like_specifier, escape_character)`

`parse_dirname(path[, separator])`

`parse_dirpath(path[, separator])`

`parse_filename(string[, trim_extension][, separator])`

`parse_path(path[, separator])`

`position(search_string IN string)`

`prefix(string, search_string)`

`printf(format, ...)`

`read_text(source)`

`regexp_escape(string)`

`regexp_extract(string, regex[, group][, options])`

`regexp_extract(string, regex, name_list[, options])`

`regexp_extract_all(string, regex[, group][, options])`

`regexp_extract_all(string, regex, name_list[, options])`

`regexp_full_match(string, regex[, col2])`

`regexp_matches(string, regex[, options])`

`regexp_replace(string, regex, replacement[, options])`

`regexp_split_to_table(string, regex)`

`repeat(string, count)`

`replace(string, source, target)`

`reverse(string)`

`right(string, count)`

`right_grapheme(string, count)`

`rpad(string, count, character)`

`rtrim(string[, characters])`

`sha1(value)`

`sha256(value)`

`split_part(string, separator, index)`

`starts_with(string, search_string)`

`string_split(string, separator)`

`string_split_regex(string, regex[, options])`

`strip_accents(string)`

`strlen(string)`

`substring(string, start[, length])`

`substring_grapheme(string, start[, length])`

`suffix(string, search_string)`

`to_base(number, radix[, min_length])`

`to_base64(blob)`

`translate(string, from, to)`

`trim(string[, characters])`