Skip to main content

Pattern Matching

There are four separate approaches to pattern matching provided by SereneDB: the traditional SQL LIKE operator, the more recent SIMILAR TO operator (added in SQL:1999), a GLOB operator, and POSIX-style regular expressions.

LIKE

The LIKE expression returns true if the string matches the supplied pattern. (As expected, the NOT LIKE expression returns false if LIKE returns true, and vice versa. An equivalent expression is NOT (string LIKE pattern).)

If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE acts like the equals operator. An underscore (_) in pattern stands for (matches) any single character; a percent sign (%) matches any sequence of zero or more characters.

LIKE pattern matching always covers the entire string. Therefore, if it's desired to match a sequence anywhere within a string, the pattern must start and end with a percent sign.

Some examples:

Query
SELECT 'abc' LIKE 'abc';
SELECT 'abc' LIKE 'a%' ;
SELECT 'abc' LIKE '_b_';
SELECT 'abc' LIKE 'c';
SELECT 'abc' LIKE 'c%' ;
SELECT 'abc' LIKE '%c';
SELECT 'abc' NOT LIKE '%c';
Result
 ?column?---------- t
 ?column?---------- t
 ?column?---------- t
 ?column?---------- f
 ?column?---------- f
 ?column?---------- t
 ?column?---------- f

The keyword ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale:

Query
SELECT 'abc' ILIKE '%C';
Result
 ?column?---------- t
Query
SELECT 'abc' NOT ILIKE '%C';
Result
 ?column?---------- f

To search within a string for a character that is a wildcard (% or _), the pattern must use an ESCAPE clause and an escape character to indicate the wildcard should be treated as a literal character instead of a wildcard. See an example below.

Additionally, the function like_escape has the same functionality as a LIKE expression with an ESCAPE clause, but using function syntax. See the Text Functions page for details.

Search for strings with 'a' then a literal percent sign then 'c':

Query
SELECT 'a%c' LIKE 'a$%c' ESCAPE '$';
SELECT 'azc' LIKE 'a$%c' ESCAPE '$';
Result
 ?column?---------- t
 ?column?---------- f

Case-insensitive ILIKE with ESCAPE:

Query
SELECT 'A%c' ILIKE 'a$%c' ESCAPE '$';
Result
 ?column?---------- t

There are also alternative characters that can be used as keywords in place of LIKE expressions. These enhance PostgreSQL compatibility.

PostgreSQL-styleLIKE-style
~~LIKE
!~~NOT LIKE
~~*ILIKE
!~~*NOT ILIKE

SIMILAR TO

The SIMILAR TO operator returns true or false depending on whether its pattern matches the given string. It is similar to LIKE, except that it interprets the pattern using a regular expression. Like LIKE, the SIMILAR TO operator succeeds only if its pattern matches the entire string; this is unlike common regular expression behavior where the pattern can match any part of the string.

A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with LIKE, pattern characters match string characters exactly unless they are special characters in the regular expression language — but regular expressions use different special characters than LIKE does.

Some examples:

Query
SELECT 'abc' SIMILAR TO 'abc';
SELECT 'abc' SIMILAR TO 'a';
SELECT 'abc' SIMILAR TO '.*(b|d).*';
SELECT 'abc' SIMILAR TO '(b|c).*';
SELECT 'abc' NOT SIMILAR TO 'abc';
Result
 ?column?---------- t
 ?column?---------- f
 ?column?---------- f
 ?column?---------- f
 ?column?---------- f

Globbing

SereneDB supports file name expansion, also known as globbing, for discovering files. SereneDB's glob syntax uses the question mark (?) wildcard to match any single character and the asterisk (*) to match zero or more characters. In addition, you can use the bracket syntax ([...]) to match any single character contained within the brackets, or within the character range specified by the brackets. An exclamation mark (!) may be used inside the first bracket to search for a character that is not contained within the brackets. To learn more, visit the “glob (programming)” Wikipedia page.

GLOB

The GLOB operator returns true or false if the string matches the GLOB pattern. The GLOB operator is most commonly used when searching for filenames that follow a specific pattern (for example a specific file extension).

Some examples:

Query
SELECT 'best.txt' GLOB '*.txt';
SELECT 'best.txt' GLOB '????.txt';
SELECT 'best.txt' GLOB '?.txt';
SELECT 'best.txt' GLOB '[abc]est.txt';
SELECT 'best.txt' GLOB '[a-z]est.txt';
Result
 ?column?---------- t
 ?column?---------- t
 ?column?---------- f
 ?column?---------- t
 ?column?---------- t

The bracket syntax is case-sensitive:

Query
SELECT 'Best.txt' GLOB '[a-z]est.txt';
SELECT 'Best.txt' GLOB '[a-zA-Z]est.txt';
Result
 ?column?---------- f
 ?column?---------- t

The ! applies to all characters within the brackets:

Query
SELECT 'Best.txt' GLOB '[!a-zA-Z]est.txt';
Result
 ?column?---------- f

To negate a GLOB operator, negate the entire expression:

Query
SELECT NOT 'best.txt' GLOB '*.txt';
Result
 ?column?---------- f

Three tildes (~~~) may also be used in place of the GLOB keyword.

GLOB-styleSymbolic-style
GLOB~~~

Glob Function to Find Filenames

The glob pattern matching syntax can also be used to search for filenames using the glob table function. It accepts one parameter: the path to search (which may include glob patterns).

Search the current directory for all files:

Query
SELECT * FROM glob('*');
Result
 file--------------- serened test.csv test.json test.parquet test2.csv test2.parquet todos.json

Globbing Semantics

SereneDB's globbing implementation follows the semantics of Python's glob and not the glob used in the shell. A notable difference is the behavior of the **/ construct: **/⟨filename⟩ will not return a file with ⟨filename⟩ in top-level directory. For example, with a README.md file present in the directory, the following query finds it:

Query
SELECT * FROM glob('README.md');
Result
 file----------- README.md

However, the following query returns an empty result:

Query
SELECT * FROM glob('**/README.md');
Result
file

Meanwhile, the globbing of Bash, Zsh, etc. finds the file using the same syntax:

ls **/README.md
README.md

Regular Expressions

SereneDB's regular expression support is documented on the Regular Expressions page. SereneDB supports some PostgreSQL-style operators for regular expression matching:

PostgreSQL-styleEquivalent expression
~regexp_full_match
!~NOT regexp_full_match
~*(not supported)
!~*(not supported)