Text Functions
Text Functions and Operators
This section describes functions and operators for examining and manipulating STRING values.
| Function | Description |
|---|---|
string[index] | Extracts a single character using a (1-based) index. |
string[begin:end] | Extracts a string using slice conventions similar to Python. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. |
string LIKE target | Returns true if the string matches the like specifier (see Pattern Matching). |
string SIMILAR TO regex | Returns true if the string matches the regex (see Pattern Matching). |
string ^@ search_string | Alias for starts_with. |
arg1 || arg2 | Concatenates two strings, lists, or blobs. Any NULL input results in NULL. See also concat(arg1, arg2, ...) and list_concat(list1, list2, ...). |
array_extract(string, index) | Extracts a single character from a string using a (1-based) index. |
array_slice(list, begin, end) | Extracts a sublist or substring using slice conventions. Negative values are accepted. |
ascii(string) | Returns an integer that represents the Unicode code point of the first character of the string. |
bar(x, min, max[, width]) | Draws a band whose width is proportional to (x - min) and equal to width characters when x = max. width defaults to 80. |
base64(blob) | Alias for to_base64. |
bin(string) | Converts the string to binary representation. |
bit_length(string) | Number of bits in a string. |
char_length(string) | Alias for length. |
character_length(string) | Alias for length. |
chr(code_point) | Returns a character which is corresponding the ASCII code value or Unicode code point. |
concat(value, ...) | Concatenates multiple strings or lists. NULL inputs are skipped. See also operator ||. |
concat_ws(separator, string, ...) | Concatenates many strings, separated by separator. NULL inputs are skipped. |
contains(string, search_string) | Returns true if search_string is found within string. Note that collations are not supported. |
ends_with(string, search_string) | Alias for suffix. |
format(format, ...) | Formats a string using PostgreSQL format specifiers (%s, %I, %L). |
formatReadableDecimalSize(integer) | Converts integer to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). |
format_bytes(integer) | Converts integer to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). |
from_base64(string) | Converts a base64 encoded string to a character string (BLOB). |
from_binary(value) | Alias for unbin. |
from_hex(value) | Alias for unhex. |
greatest(arg1, ...) | Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and collations are not supported. |
hash(value, ...) | Returns a UBIGINT with the hash of the value. Note that this is not a cryptographic hash. |
hex(string) | Converts the string to hexadecimal representation. |
ilike_escape(string, like_specifier, escape_character) | Returns true if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string. |
instr(string, search_string) | Returns location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found. |
lcase(string) | Alias for lower. |
least(arg1, ...) | Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and collations are not supported. |
left(string, count) | Extracts the left-most count characters. |
left_grapheme(string, count) | Extracts the left-most count grapheme clusters. |
len(string) | Alias for length. |
length(string) | Number of characters in string. |
length_grapheme(string) | Number of grapheme clusters in string. |
like_escape(string, like_specifier, escape_character) | Returns true if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string. |
lower(string) | Converts string to lower case. |
lpad(string, count, character) | Pads the string with the character on the left until it has count characters. Truncates the string on the right if it has more than count characters. |
ltrim(string[, characters]) | Removes any occurrences of any of the characters from the left side of the string. characters defaults to space. |
md5(string) | Returns the MD5 hash of the string as a VARCHAR. |
md5_number(string) | Returns the MD5 hash of the string as a HUGEINT. |
md5_number_lower(string) | Returns the lower 64-bit segment of the MD5 hash of the string as a UBIGINT. |
md5_number_upper(string) | Returns the upper 64-bit segment of the MD5 hash of the string as a UBIGINT. |
nfc_normalize(string) | Converts string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. |
not_ilike_escape(string, like_specifier, escape_character) | Returns false if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string. |
not_like_escape(string, like_specifier, escape_character) | Returns false if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string. |
ord(string) | Alias for unicode. |
parse_dirname(path[, separator]) | Returns the top-level directory name from the given path. separator options: system, both_slash (default), forward_slash, backslash. |
parse_dirpath(path[, separator]) | Returns the head of the path (the pathname until the last slash) similarly to Python's os.path.dirname. separator options: system, both_slash (default), forward_slash, backslash. |
parse_filename(string[, trim_extension][, separator]) | Returns the last component of the path similarly to Python's os.path.basename function. If trim_extension is true, the file extension will be removed (defaults to false). separator options: system, both_slash (default), forward_slash, backslash. |
parse_path(path[, separator]) | Returns a list of the components (directories and filename) in the path similarly to Python's pathlib.parts function. separator options: system, both_slash (default), forward_slash, backslash. |
position(search_string IN string) | Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found. |
position(string, search_string) | Alias for instr. |
prefix(string, search_string) | Returns true if string starts with search_string. |
printf(format, ...) | Formats a string using printf syntax. |
read_text(source) | Returns the content from source (a filename, a list of filenames, or a glob pattern) as a VARCHAR. The file content is first validated to be valid UTF-8. If read_text attempts to read a file with invalid UTF-8 an error is thrown suggesting to use read_blob instead. See the read_text guide for more details. |
regexp_escape(string) | Escapes special patterns to turn string into a regular expression similarly to Python's re.escape function. |
regexp_extract(string, regex[, group][, options]) | If string contains the regex pattern, returns the capturing group specified by optional parameter group; otherwise, returns the empty string. The group must be a constant value. If no group is given, it defaults to 0. A set of optional regex options can be set. |
regexp_extract(string, regex, name_list[, options]) | If string contains the regex pattern, returns the capturing groups as a struct with corresponding names from name_list; otherwise, returns a struct with the same keys and empty strings as values. A set of optional regex options can be set. |
regexp_extract_all(string, regex[, group][, options]) | Finds non-overlapping occurrences of the regex in the string and returns the corresponding values of the capturing group. A set of optional regex options can be set. |
regexp_extract_all(string, regex, name_list[, options]) | Finds non-overlapping occurrences of regex in string and returns the capturing groups as a list of structs with corresponding names from name_list. A set of optional regex options can be set. |
regexp_full_match(string, regex[, col2]) | Returns true if the entire string matches the regex. A set of optional regex options can be set. |
regexp_matches(string, regex[, options]) | Returns true if string contains the regex, false otherwise. A set of optional regex options can be set. |
regexp_replace(string, regex, replacement[, options]) | If string contains the regex, replaces the matching part with replacement. A set of optional regex options can be set. |
regexp_split_to_array(string, regex[, options]) | Alias for string_split_regex. |
regexp_split_to_table(string, regex) | Splits the string along the regex and returns a row for each part. |
repeat(string, count) | Repeats the string count number of times. |
replace(string, source, target) | Replaces any occurrences of the source with target in string. |
reverse(string) | Reverses the string. |
right(string, count) | Extract the right-most count characters. |
right_grapheme(string, count) | Extracts the right-most count grapheme clusters. |
rpad(string, count, character) | Pads the string with the character on the right until it has count characters. Truncates the string on the right if it has more than count characters. |
rtrim(string[, characters]) | Removes any occurrences of any of the characters from the right side of the string. characters defaults to space. |
sha1(value) | Returns a VARCHAR with the SHA-1 hash of the value. |
sha256(value) | Returns a VARCHAR with the SHA-256 hash of the value. |
split(string, separator) | Alias for string_split. |
split_part(string, separator, index) | Splits the string along the separator and returns the data at the (1-based) index of the list. If the index is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). |
starts_with(string, search_string) | Returns true if string begins with search_string. |
str_split(string, separator) | Alias for string_split. |
str_split_regex(string, regex[, options]) | Alias for string_split_regex. |
string_split(string, separator) | Splits the string along the separator. |
string_split_regex(string, regex[, options]) | Splits the string along the regex. A set of optional regex options can be set. |
string_to_array(string, separator) | Alias for string_split. |
strip_accents(string) | Strips accents from string. |
strlen(string) | Number of bytes in string. |
strpos(string, search_string) | Alias for instr. |
substr(string, start[, length]) | Alias for substring. |
substring(string, start[, length]) | Extracts substring starting from character start up to the end of the string. If optional argument length is set, extracts a substring of length characters instead. Note that a start value of 1 refers to the first character of the string. |
substring_grapheme(string, start[, length]) | Extracts substring starting from grapheme clusters start up to the end of the string. If optional argument length is set, extracts a substring of length grapheme clusters instead. Note that a start value of 1 refers to the first character of the string. |
suffix(string, search_string) | Returns true if string ends with search_string. Note that collations are not supported. |
to_base(number, radix[, min_length]) | Converts number to a string in the given base radix, optionally padding with leading zeros to min_length. |
to_base64(blob) | Converts a blob to a base64 encoded string. |
to_binary(string) | Alias for bin. |
to_hex(string) | Alias for hex. |
translate(string, from, to) | Replaces each character in string that matches a character in the from set with the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are deleted. |
trim(string[, characters]) | Removes any occurrences of any of the characters from either side of the string. characters defaults to space. |
ucase(string) | Alias for upper. |
unbin(value) | Converts a value from binary representation to a blob. |
unhex(value) | Converts a value from hexadecimal representation to a blob. |
unicode(string) | Returns an INTEGER representing the unicode codepoint of the first character in the string. |
upper(string) | Converts string to upper case. |
url_decode(string) | Decodes a URL from a representation using Percent-Encoding. |
url_encode(string) | Encodes a URL to a representation using Percent-Encoding. |
string[index]
Extracts a single character using a (1-based) index. Alias: array_extract.
SELECT 'SereneDB'[4] AS char; char------ estring[begin:end]
Extracts a string using slice conventions similar to Python. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. Alias: array_slice.
SELECT 'SereneDB'[:4] AS slice; slice------- Serestring LIKE target
Returns true if the string matches the like specifier (see Pattern Matching).
SELECT 'hello' LIKE '%lo' AS matches; matches--------- tstring SIMILAR TO regex
Returns true if the string matches the regex (see Pattern Matching). Alias: regexp_full_match.
SELECT 'hello' SIMILAR TO 'l+' AS matches; matches--------- farg1 || arg2
Concatenates two strings, lists, or blobs. Any NULL input results in NULL. See also concat(arg1, arg2, ...) and list_concat(list1, list2, ...).
SELECT 'Serene' || 'DB' AS strings;
SELECT [1, 2, 3] || [4, 5, 6] AS lists;
SELECT '\xAA'::BLOB || '\xBB'::BLOB AS blobs; strings---------- SereneDB
lists--------------- {1,2,3,4,5,6}
blobs--------- \\xaabbarray_extract(string, index)
Extracts a single character from a string using a (1-based) index.
SELECT array_extract('SereneDB', 2) AS char; char------ earray_slice(list, begin, end)
Extracts a sublist or substring using slice conventions. Negative values are accepted. Alias: list_slice.
SELECT array_slice('SereneDB', 3, 4) AS slice;
SELECT array_slice('SereneDB', 3, NULL) AS slice;
SELECT array_slice('SereneDB', 0, -3) AS slice; slice------- re
slice------- NULL
slice-------- Sereneascii(string)
Returns an integer that represents the Unicode code point of the first character of the string.
SELECT ascii('Ω') AS code_point; code_point------------ 937bar(x, min, max[, width])
Draws a band whose width is proportional to (x - min) and equal to width characters when x = max. width defaults to 80.
SELECT bar(5, 0, 20, 10) AS band; band------ ██▌bin(string)
Converts the string to binary representation. Alias: to_binary.
SELECT bin('Aa') AS binary; binary------------------ 0100000101100001bit_length(string)
Number of bits in a string.
SELECT bit_length('abc') AS bits; bits------ 24chr(code_point)
Returns a character which is corresponding the ASCII code value or Unicode code point.
SELECT chr(65) AS char; char------ Aconcat(value, ...)
Concatenates multiple strings or lists. NULL inputs are skipped. See also operator ||.
SELECT concat('Hello', ' ', 'World') AS result;
SELECT concat([1, 2, 3], NULL, [4, 5, 6]) AS result; result------------- Hello World
result--------------- {1,2,3,4,5,6}concat_ws(separator, string, ...)
Concatenates many strings, separated by separator. NULL inputs are skipped.
SELECT concat_ws(', ', 'Banana', 'Apple', 'Melon') AS result; result---------------------- Banana, Apple, Meloncontains(string, search_string)
Returns true if search_string is found within string.
SELECT contains('abc', 'a') AS result; result-------- tformat(format, ...)
Formats a string using PostgreSQL-style format specifiers (%s, %I, %L, %%).
SELECT format('Hello %s, you are %s', 'World', 42) AS greeting; greeting------------------------- Hello World, you are 42formatReadableDecimalSize(integer)
Converts integer to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.).
SELECT formatReadableDecimalSize(16000) AS result; result--------- 16.0 kBformat_bytes(integer)
Converts integer to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). Aliases: formatReadableSize, pg_size_pretty.
SELECT format_bytes(16_000) AS result; result---------- 15.6 KiBfrom_base64(string)
Converts a base64 encoded string to a character string (BLOB).
SELECT from_base64('QQ==') AS result; result-------- \\x41greatest(arg1, ...)
Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and collations are not supported.
SELECT greatest(42, 84) AS result;
SELECT greatest('abc', 'bcd', 'cde', 'EFG') AS result; result-------- 84
result-------- cdehash(value, ...)
Returns a UBIGINT with the hash of the value. Note that this is not a cryptographic hash.
SELECT hash('🌊') AS result; result---------------------- 12255790945742965276hex(string)
Converts the string to hexadecimal representation. Alias: to_hex.
SELECT hex('Hello') AS result; result------------ 48656C6C6Filike_escape(string, like_specifier, escape_character)
Returns true if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string.
SELECT ilike_escape('A%c', 'a$%C', '$') AS result; result-------- tinstr(string, search_string)
Returns location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found. Aliases: position, strpos.
SELECT instr('test test', 'es') AS result; result-------- 2least(arg1, ...)
Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and collations are not supported.
SELECT least(42, 84) AS result;
SELECT least('abc', 'bcd', 'cde', 'EFG') AS result; result-------- 42
result-------- EFGleft(string, count)
Extracts the left-most count characters.
SELECT left('Hello🌊', 2) AS result; result-------- Heleft_grapheme(string, count)
Extracts the left-most count grapheme clusters.
SELECT left_grapheme('🤦🏼♂️🤦🏽♀️', 1) AS result; result--------- 🤦🏼♂️length(string)
Number of characters in string. Aliases: char_length, character_length, len.
SELECT length('Hello🌊') AS result; result-------- 6length_grapheme(string)
Number of grapheme clusters in string.
SELECT length_grapheme('🤦🏼♂️🤦🏽♀️') AS result; result-------- 2like_escape(string, like_specifier, escape_character)
Returns true if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string.
SELECT like_escape('a%c', 'a$%c', '$') AS result; result-------- tlower(string)
Converts string to lower case. Alias: lcase.
SELECT lower('Hello') AS result; result-------- hellolpad(string, count, character)
Pads the string with the character on the left until it has count characters. Truncates the string on the right if it has more than count characters.
SELECT lpad('hello', 8, '>') AS result; result---------- >>>helloltrim(string[, characters])
Removes any occurrences of any of the characters from the left side of the string. characters defaults to space.
SELECT ltrim(' test ') AS result;
SELECT ltrim('>>>>test<<', '><') AS result; result-------- test
result-------- test<<md5(string)
Returns the MD5 hash of the string as a VARCHAR.
SELECT md5('abc') AS result; result---------------------------------- 900150983cd24fb0d6963f7d28e17f72md5_number(string)
Returns the MD5 hash of the string as a HUGEINT.
md5_number_lower(string)
Returns the lower 64-bit segment of the MD5 hash of the string as a UBIGINT.
SELECT md5_number_lower('abc') AS result; result--------------------- 8250560606382298838md5_number_upper(string)
Returns the upper 64-bit segment of the MD5 hash of the string as a UBIGINT.
SELECT md5_number_upper('abc') AS result; result---------------------- 12704604231530709392nfc_normalize(string)
Converts string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not.
SELECT nfc_normalize('ardèch') AS result; result-------- ardèchnot_ilike_escape(string, like_specifier, escape_character)
Returns false if the string matches the like_specifier (see Pattern Matching) using case-insensitive matching. escape_character is used to search for wildcard characters in the string.
SELECT not_ilike_escape('A%c', 'a$%C', '$') AS result; result-------- fnot_like_escape(string, like_specifier, escape_character)
Returns false if the string matches the like_specifier (see Pattern Matching) using case-sensitive matching. escape_character is used to search for wildcard characters in the string.
SELECT not_like_escape('a%c', 'a$%c', '$') AS result; result-------- fparse_dirname(path[, separator])
Returns the top-level directory name from the given path. separator options: system, both_slash (default), forward_slash, backslash.
SELECT parse_dirname('path/to/file.csv', 'system') AS result; result-------- pathparse_dirpath(path[, separator])
Returns the head of the path (the pathname until the last slash) similarly to Python's os.path.dirname. separator options: system, both_slash (default), forward_slash, backslash.
SELECT parse_dirpath('path/to/file.csv', 'forward_slash') AS result; result--------- path/toparse_filename(string[, trim_extension][, separator])
Returns the last component of the path similarly to Python's os.path.basename function. If trim_extension is true, the file extension will be removed (defaults to false). separator options: system, both_slash (default), forward_slash, backslash.
SELECT parse_filename('path/to/file.csv', true, 'forward_slash') AS result; result-------- fileparse_path(path[, separator])
Returns a list of the components (directories and filename) in the path similarly to Python's pathlib.parts function. separator options: system, both_slash (default), forward_slash, backslash.
SELECT parse_path('path/to/file.csv', 'system') AS result; result-------------------- {path,to,file.csv}position(search_string IN string)
Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found. Aliases: instr, strpos.
SELECT position('b' IN 'abc') AS result; result-------- 2prefix(string, search_string)
Returns true if string starts with search_string.
SELECT prefix('abc', 'ab') AS result; result-------- tprintf(format, ...)
Formats a string using printf syntax.
SELECT printf('Benchmark "%s" took %d seconds', 'CSV', 42) AS result; result--------------------------------- Benchmark "CSV" took 42 secondsread_text(source)
A table function that returns the content from source (a filename, a list of filenames, or a glob pattern); query it with FROM. It yields filename, content, size and last_modified columns, with content as a VARCHAR. The file content is first validated to be valid UTF-8. If read_text attempts to read a file with invalid UTF-8 an error is thrown suggesting to use read_blob instead. See the read_text guide for more details.
regexp_escape(string)
Escapes special patterns to turn string into a regular expression similarly to Python's re.escape function.
SELECT regexp_escape('https://serenedb.com') AS result; result------------------------------ https\\:\\/\\/serenedb\\.comregexp_extract(string, regex[, group][, options])
If string contains the regex pattern, returns the capturing group specified by optional parameter group; otherwise, returns the empty string. The group must be a constant value. If no group is given, it defaults to 0. A set of optional regex options can be set.
SELECT regexp_extract('ABC', '([a-z])(b)', 1, 'i') AS result; result-------- Aregexp_extract(string, regex, name_list[, options])
If string contains the regex pattern, returns the capturing groups as a struct with corresponding names from name_list; otherwise, returns a struct with the same keys and empty strings as values. A set of optional regex options can be set.
SELECT regexp_extract('John Doe', '([a-z]+) ([a-z]+)', ['first_name', 'last_name'], 'i') AS result; result------------ (John,Doe)regexp_extract_all(string, regex[, group][, options])
Finds non-overlapping occurrences of the regex in the string and returns the corresponding values of the capturing group. A set of optional regex options can be set.
SELECT regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2) AS result; result--------- {33,14}regexp_extract_all(string, regex, name_list[, options])
Finds non-overlapping occurrences of regex in string and returns the capturing groups as a list of structs with corresponding names from name_list. A set of optional regex options can be set.
SELECT regexp_extract_all('Peter: 33, Paul: 14', '(\w+):\s*(\d+)', ['name', 'age']) AS result; result---------------------------- {"(Peter,33)","(Paul,14)"}regexp_full_match(string, regex[, col2])
Returns true if the entire string matches the regex. A set of optional regex options can be set.
SELECT regexp_full_match('anabanana', '(an)*') AS result; result-------- fregexp_matches(string, regex[, options])
Returns true if string contains the regex, false otherwise. A set of optional regex options can be set.
SELECT regexp_matches('anabanana', '(an)*') AS result; result-------- tregexp_replace(string, regex, replacement[, options])
If string contains the regex, replaces the matching part with replacement. A set of optional regex options can be set.
SELECT regexp_replace('hello', '[lo]', '-') AS result; result-------- he-loregexp_split_to_table(string, regex)
Splits the string along the regex and returns a row for each part.
SELECT * FROM regexp_split_to_table('hello world; 42', ';? '); regexp_split_to_table----------------------- hello world 42repeat(string, count)
Repeats the string count number of times.
SELECT repeat('A', 5) AS result; result-------- AAAAAreplace(string, source, target)
Replaces any occurrences of the source with target in string.
SELECT replace('hello', 'l', '-') AS result; result-------- he--oreverse(string)
Reverses the string.
SELECT reverse('hello') AS result; result-------- ollehright(string, count)
Extract the right-most count characters.
SELECT right('Hello🌊', 3) AS result; result-------- lo🌊right_grapheme(string, count)
Extracts the right-most count grapheme clusters.
SELECT right_grapheme('🤦🏼♂️🤦🏽♀️', 1) AS result; result--------- 🤦🏽♀️rpad(string, count, character)
Pads the string with the character on the right until it has count characters. Truncates the string on the right if it has more than count characters.
SELECT rpad('hello', 10, '<') AS result; result------------ hello<<<<<rtrim(string[, characters])
Removes any occurrences of any of the characters from the right side of the string. characters defaults to space.
SELECT rtrim(' test ') AS result;
SELECT rtrim('>>>>test<<', '><') AS result; result---------- test
result---------- >>>>testsha1(value)
Returns a VARCHAR with the SHA-1 hash of the value.
SELECT sha1('🌊') AS result; result------------------------------------------ be25fe9cf30cf5c01ae2054b60a5e5511e966d98sha256(value)
Returns a VARCHAR with the SHA-256 hash of the value
SELECT sha256('🌊') AS result; result------------------------------------------------------------------ 8db6bed4743d5a6159e0742df0fded8377d1250420aeda292e41ffa20eba806esplit_part(string, separator, index)
Splits the string along the separator and returns the data at the (1-based) index of the list. If the index is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior).
SELECT split_part('a;b;c', ';', 2) AS result; result-------- bstarts_with(string, search_string)
Returns true if string begins with search_string. Alias: ^@.
SELECT starts_with('abc', 'a') AS result; result-------- tstring_split(string, separator)
Splits the string along the separator. Aliases: split, str_split, string_to_array.
SELECT string_split('hello-world', '-') AS result; result--------------- {hello,world}string_split_regex(string, regex[, options])
Splits the string along the regex. A set of optional regex options can be set. Aliases: regexp_split_to_array, str_split_regex.
SELECT string_split_regex('hello world; 42', ';? ') AS result; result------------------ {hello,world,42}strip_accents(string)
Strips accents from string.
SELECT strip_accents('mühleisen') AS result; result----------- muhleisenstrlen(string)
Number of bytes in string.
SELECT strlen('🌊') AS result; result-------- 4substring(string, start[, length])
Extracts substring starting from character start up to the end of the string. If optional argument length is set, extracts a substring of length characters instead. Note that a start value of 1 refers to the first character of the string. Alias: substr.
SELECT substring('Hello', 2) AS result;
SELECT substring('Hello', 2, 2) AS result; result-------- ello
result-------- elsubstring_grapheme(string, start[, length])
Extracts substring starting from grapheme clusters start up to the end of the string. If optional argument length is set, extracts a substring of length grapheme clusters instead. Note that a start value of 1 refers to the first character of the string.
SELECT substring_grapheme('🌊🤦🏼♂️🤦🏽♀️🌊', 3) AS result;
SELECT substring_grapheme('🌊🤦🏼♂️🤦🏽♀️🌊', 3, 2) AS result; result----------- 🤦🏽♀️🌊
result----------- 🤦🏽♀️🌊suffix(string, search_string)
Returns true if string ends with search_string. Note that collations are not supported. Alias: ends_with.
SELECT suffix('abc', 'bc') AS result; result-------- tto_base(number, radix[, min_length])
Converts number to a string in the given base radix, optionally padding with leading zeros to min_length.
SELECT to_base(42, 16, 5) AS result; result-------- 0002Ato_base64(blob)
Converts a blob to a base64 encoded string. Alias: base64.
SELECT to_base64('A'::BLOB) AS result; result-------- QQ==translate(string, from, to)
Replaces each character in string that matches a character in the from set with the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are deleted.
SELECT translate('12345', '143', 'ax') AS result; result-------- a2x5trim(string[, characters])
Removes any occurrences of any of the characters from either side of the string. characters defaults to space.
SELECT trim(' test ') AS result;
SELECT trim('>>>>test<<', '><') AS result; result-------- test
result-------- testunbin(value)
Converts a value from binary representation to a blob. Alias: from_binary.
SELECT unbin('0110') AS result; result-------- \\x06unhex(value)
Converts a value from hexadecimal representation to a blob. Alias: from_hex.
SELECT unhex('2A') AS result; result-------- \\x2aunicode(string)
Returns an INTEGER representing the unicode codepoint of the first character in the string. Alias: ord.
SELECT [unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)] AS result; result------------------- {226,226,-1,NULL}upper(string)
Converts string to upper case. Alias: ucase.
SELECT upper('Hello') AS result; result-------- HELLOurl_decode(string)
Decodes a URL from a representation using Percent-Encoding.
SELECT url_decode('https%3A%2F%2Fserenedb.com%2Fwhy_serenedb%23portable') AS result; result-------------------------------------------- https://serenedb.com/why_serenedb#portableurl_encode(string)
Encodes a URL to a representation using Percent-Encoding.
SELECT url_encode('this string has/ special+ characters>') AS result; result----------------------------------------------------- this%20string%20has%2F%20special%2B%20characters%3EText Similarity Functions
These functions are used to measure the similarity of two strings using various similarity measures.
| Function | Description |
|---|---|
damerau_levenshtein(s1, s2) | Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., a and A) are considered different. |
editdist3(s1, s2) | Alias for levenshtein. |
hamming(s1, s2) | The Hamming distance between two strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., a and A) are considered different. |
jaccard(s1, s2) | The Jaccard similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. |
jaro_similarity(s1, s2[, score_cutoff]) | The Jaro similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff, 0 is returned instead. score_cutoff defaults to 0. |
jaro_winkler_similarity(s1, s2[, score_cutoff]) | The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff, 0 is returned instead. score_cutoff defaults to 0. |
levenshtein(s1, s2) | The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., a and A) are considered different. |
mismatches(s1, s2) | Alias for hamming. |
damerau_levenshtein(s1, s2)
Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., a and A) are considered different.
SELECT damerau_levenshtein('serene', 'esrnee') AS result; result-------- 2hamming(s1, s2)
The Hamming distance between two strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., a and A) are considered different. Alias: mismatches.
SELECT hamming('serene', 'serine') AS result; result-------- 1jaccard(s1, s2)
The Jaccard similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1.
SELECT jaccard('serene', 'serine') AS result; result-------- 0.8jaro_similarity(s1, s2[, score_cutoff])
The Jaro similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff, 0 is returned instead. score_cutoff defaults to 0.
SELECT jaro_similarity('serene', 'serenedb') AS result; result-------------------- 0.9166666666666666jaro_winkler_similarity(s1, s2[, score_cutoff])
The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., a and A) are considered different. Returns a number between 0 and 1. For similarity < score_cutoff, 0 is returned instead. score_cutoff defaults to 0.
SELECT jaro_winkler_similarity('serene', 'serenedb') AS result; result-------- 0.95levenshtein(s1, s2)
The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., a and A) are considered different. Alias: editdist3.
SELECT levenshtein('serene', 'serenedb') AS result; result-------- 2Formatters
format Syntax
The format(format, ...) function formats a string using PostgreSQL-style format specifiers — each specifier in format is replaced by the corresponding argument. (This is PostgreSQL's format, not the {}-placeholder syntax of the {fmt} library.)
A specifier has the form %[position$][-][width]type, where type is one of:
| Type | Description |
|---|---|
%s | The argument as a string. |
%I | The argument as a quoted SQL identifier (double-quoted when needed). |
%L | The argument as a quoted SQL literal (single-quoted, embedded quotes escaped; NULL becomes NULL). |
%% | A literal % (consumes no argument). |
Only s, I and L are valid types — any other (e.g. %d) raises unrecognized format() type specifier.
Basic substitution with %s:
SELECT format('Hello %s, you are %s', 'World', 42) AS greeting; greeting------------------------- Hello World, you are 42%I and %L safely quote identifiers and literals — useful for building SQL dynamically:
SELECT format('INSERT INTO %I VALUES (%L)', 'my table', 'a''b') AS stmt; stmt---------------------------------------- INSERT INTO "my table" VALUES ('a''b')Arguments are consumed in order; an explicit position can be given as %n$ (e.g. %1$s), which also lets you reuse and reorder them:
SELECT format('%1$s %1$s %2$s', 'a', 'b') AS repeated; repeated---------- a a bAn optional minimum field width right-justifies the value (left-justifies with a leading -), padding with spaces:
SELECT format('|%-10s|%10s|', 'left', 'right') AS padded; padded------------------------- |left | right|printf Syntax
The printf(format, parameters...) function formats strings using the printf syntax.
Format without additional parameters:
SELECT printf('Hello world'); printf------------- Hello worldFormat a string using arguments in a given order:
SELECT printf('The answer to %s is %d', 'life', 42); printf-------------------------- The answer to life is 42Format a string using positional arguments %position$formatter, e.g., the second parameter as a string is encoded as %2$s:
SELECT printf('I''d rather be %2$s than %1$s.', 'right', 'happy'); printf--------------------------------- I'd rather be happy than right.Format Specifiers
| Specifier | Description | Example |
|---|---|---|
%c | character code to character | a |
%d | integer | 654321 |
%E | scientific notation | 3.141593E+00 |
%f | float | 4.560000 |
%hd | integer truncated to a C short | -1039 |
%hhd | integer truncated to a C signed char | -15 |
%lld | integer | 654321 |
%o | octal | 2375761 |
%s | string | asd |
%x | hexadecimal | 9fbf1 |
Formatting Types
Integers:
SELECT printf('%d + %d = %d', 3, 5, 3 + 5); printf----------- 3 + 5 = 8Booleans are not accepted as %s arguments and raise invalid format specifier; cast them to text first:
SELECT printf('%s != %s', true, false);db error: ERROR: invalid format specifierFormat datetime values:
SELECT printf('%s', DATE '1992-01-01');
SELECT printf('%s', TIME '12:01:00');
SELECT printf('%s', TIMESTAMP '1992-01-01 12:01:00'); printf------------ 1992-01-01
printf---------- 12:01:00
printf--------------------- 1992-01-01 12:01:00Format a BLOB (rendered as a hex string):
SELECT printf('%s', 'hello'::BLOB) AS printf; printf--------------- \\x68656c6c6fPad integers with 0s:
SELECT printf('%04d', 33); printf-------- 0033Create timestamps from integers:
SELECT printf('%02d:%02d:%02d %s', 12, 3, 16, 'AM'); printf------------- 12:03:16 AMConvert to hexadecimal:
SELECT printf('%x', 123_456_789); printf--------- 75bcd15There is no binary (%b) specifier; it raises invalid format specifier. Use bin to obtain a binary representation:
SELECT printf('%b', 123_456_789);db error: ERROR: invalid format specifierThousand Separators
printf does not support thousand-separator specifiers (such as %,d, %_d or %''d); they raise invalid format specifier:
SELECT printf('%,d', 123_456_789);
SELECT printf('%.d', 123_456_789);
SELECT printf('%''d', 123_456_789);
SELECT printf('%_d', 123_456_789);db error: ERROR: invalid format specifier
printf----------- 123456789
db error: ERROR: invalid format specifier
db error: ERROR: invalid format specifierThis also applies to floats and doubles:
SELECT printf('%,f', 123456.789);
SELECT printf('%,.2f', 123456.789);db error: ERROR: invalid format specifier
db error: ERROR: invalid format specifier