Functions for working with strings
Functions for searching in strings and for replacing in strings are described separately.
The documentation below is generated from the system.functions system table.
CRC32
Introduced in: v20.1
Calculates the CRC32 checksum of a string using the CRC-32-IEEE 802.3 polynomial and initial value 0xffffffff (zlib implementation).
Syntax
Arguments
- s— String to calculate CRC32 for.- String
Returned value
Returns the CRC32 checksum of the string. UInt32
Examples
Usage example
CRC32IEEE
Introduced in: v20.1
Calculates the CRC32 checksum of a string using the CRC-32-IEEE 802.3 polynomial.
Syntax
Arguments
- s— String to calculate CRC32 for.- String
Returned value
Returns the CRC32 checksum of the string. UInt32
Examples
Usage example
CRC64
Introduced in: v20.1
Calculates the CRC64 checksum of a string using the CRC-64-ECMA polynomial.
Syntax
Arguments
- s— String to calculate CRC64 for.- String
Returned value
Returns the CRC64 checksum of the string. UInt64
Examples
Usage example
appendTrailingCharIfAbsent
Introduced in: v1.1
Appends character c to string s if s is non-empty and does not end with character c.
Syntax
Arguments
Returned value
Returns string s with character c appended if s does not end with c. String
Examples
Usage example
ascii
Introduced in: v22.11
Returns the ASCII code point of the first character of string s as an Int32.
Syntax
Arguments
- s— String input.- String
Returned value
Returns the ASCII code point of the first character. If s is empty, the result is 0. If the first character is not an ASCII character or not part of the Latin-1 supplement range of UTF-16, the result is undefined. Int32
Examples
Usage example
base32Decode
Introduced in: v25.6
Decodes a Base32 (RFC 4648) string. If the string is not valid Base32-encoded, an exception is thrown.
Syntax
Arguments
- encoded— String column or constant.- String
Returned value
Returns a string containing the decoded value of the argument. String
Examples
Usage example
base32Encode
Introduced in: v25.6
Encodes a string using Base32.
Syntax
Arguments
- plaintext— Plaintext to encode.- String
Returned value
Returns a string containing the encoded value of the argument. String or FixedString
Examples
Usage example
base58Decode
Introduced in: v22.7
Decodes a Base58 string. If the string is not valid Base58-encoded, an exception is thrown.
Syntax
Arguments
- encoded— String column or constant to decode.- String
Returned value
Returns a string containing the decoded value of the argument. String
Examples
Usage example
base58Encode
Introduced in: v22.7
Encodes a string using Base58 encoding.
Syntax
Arguments
- plaintext— Plaintext to encode.- String
Returned value
Returns a string containing the encoded value of the argument. String
Examples
Usage example
base64Decode
Introduced in: v18.16
Decodes a string from Base64 representation, according to RFC 4648. Throws an exception in case of error.
Syntax
Arguments
- encoded— String column or constant to decode. If the string is not valid Base64-encoded, an exception is thrown.- String
Returned value
Returns the decoded string. String
Examples
Usage example
base64Encode
Introduced in: v18.16
Encodes a string using Base64 representation, according to RFC 4648.
Syntax
Arguments
- plaintext— Plaintext column or constant to decode.- String
Returned value
Returns a string containing the encoded value of the argument. String
Examples
Usage example
base64URLDecode
Introduced in: v24.6
Decodes a string from Base64 representation using URL-safe alphabet, according to RFC 4648. Throws an exception in case of error.
Syntax
Arguments
- encoded— String column or constant to encode. If the string is not valid Base64-encoded, an exception is thrown.- String
Returned value
Returns a string containing the decoded value of the argument. String
Examples
Usage example
base64URLEncode
Introduced in: v18.16
Encodes a string using Base64 (RFC 4648) representation using URL-safe alphabet.
Syntax
Arguments
- plaintext— Plaintext column or constant to encode.- String
Returned value
Returns a string containing the encoded value of the argument. String
Examples
Usage example
basename
Introduced in: v20.1
Extracts the tail of a string following its last slash or backslash. This function is often used to extract the filename from a path.
Syntax
Arguments
- expr— A string expression. Backslashes must be escaped.- String
Returned value
Returns the tail of the input string after its last slash or backslash. If the input string ends with a slash or backslash, the function returns an empty string. Returns the original string if there are no slashes or backslashes. String
Examples
Extract filename from Unix path
Extract filename from Windows path
String with no path separators
byteHammingDistance
Introduced in: v23.9
Calculates the hamming distance between two byte strings.
Syntax
Arguments
Returned value
Returns the Hamming distance between the two strings. UInt64
Examples
Usage example
compareSubstrings
Introduced in: v25.2
Compares two strings lexicographically.
Syntax
Arguments
- s1— The first string to compare.- String
- s2— The second string to compare.- String
- s1_offset— The position (zero-based) in- s1from which the comparison starts.- UInt*
- s2_offset— The position (zero-based index) in- s2from which the comparison starts.- UInt*
- num_bytes— The maximum number of bytes to compare in both strings. If- s1_offset(or- s2_offset) +- num_bytesexceeds the end of an input string,- num_byteswill be reduced accordingly.- UInt*
Returned value
Returns:
- -1if- s1[- s1_offset:- s1_offset+- num_bytes] <- s2[- s2_offset:- s2_offset+- num_bytes].
- 0if- s1[- s1_offset:- s1_offset+- num_bytes] =- s2[- s2_offset:- s2_offset+- num_bytes].
- 1if- s1[- s1_offset:- s1_offset+- num_bytes] >- s2[- s2_offset:- s2_offset+- num_bytes].- Int8
Examples
Usage example
concat
Introduced in: v1.1
Concatenates the given arguments.
Arguments which are not of types String or FixedString are converted to strings using their default serialization.
As this decreases performance, it is not recommended to use non-String/FixedString arguments.
Syntax
Arguments
- s1, s2, ...— Any number of values of arbitrary type.- Any
Returned value
Returns the String created by concatenating the arguments. If any of arguments is NULL, the function returns NULL. If there are no arguments, it returns an empty string. Nullable(String)
Examples
String concatenation
Number concatenation
concatAssumeInjective
Introduced in: v1.1
Like concat but assumes that concat(s1, s2, ...) → sn is injective,
i.e, it returns different results for different arguments.
Can be used for optimization of GROUP BY.
Syntax
Arguments
- s1, s2, ...— Any number of values of arbitrary type.- Stringor- FixedString
Returned value
Returns the string created by concatenating the arguments. If any of argument values is NULL, the function returns NULL. If no arguments are passed, it returns an empty string. String
Examples
Group by optimization
concatWithSeparator
Introduced in: v22.12
Concatenates the provided strings, separating them by the specified separator.
Syntax
Arguments
- sep— The separator to use.- const Stringor- const FixedString
- exp1, exp2, ...— Expression to be concatenated. Arguments which are not of type- Stringor- FixedStringare converted to strings using their default serialization. As this decreases performance, it is not recommended to use non-String/FixedString arguments.- Any
Returned value
Returns the String created by concatenating the arguments. If any of the argument values is NULL, the function returns NULL. String
Examples
Usage example
concatWithSeparatorAssumeInjective
Introduced in: v22.12
Like concatWithSeparator but assumes that concatWithSeparator(sep[,exp1, exp2, ... ]) → result is injective.
A function is called injective if it returns different results for different arguments.
Can be used for optimization of GROUP BY.
Syntax
Arguments
- sep— The separator to use.- const Stringor- const FixedString
- exp1, exp2, ...— Expression to be concatenated. Arguments which are not of type- Stringor- FixedStringare converted to strings using their default serialization. As this decreases performance, it is not recommended to use non-String/FixedString arguments.- Stringor- FixedString
Returned value
Returns the String created by concatenating the arguments. If any of the argument values is NULL, the function returns NULL. String
Examples
Usage example
conv
Introduced in: v1.1
Converts numbers between different number bases.
The function converts a number from one base to another. It supports bases from 2 to 36. For bases higher than 10, letters A-Z (case insensitive) are used to represent digits 10-35.
This function is compatible with MySQL's CONV() function.
Syntax
Arguments
- number— The number to convert. Can be a string or numeric type. -- from_base— The source base (2-36). Must be an integer. -- to_base— The target base (2-36). Must be an integer.
Returned value
String representation of the number in the target base.
Examples
Convert decimal to binary
Convert hexadecimal to decimal
Convert with negative number
Convert binary to octal
convertCharset
Introduced in: v1.1
Returns string s converted from the encoding from to encoding to.
Syntax
Arguments
- s— Input string.- String
- from— Source character encoding.- String
- to— Target character encoding.- String
Returned value
Returns string s converted from encoding from to encoding to. String
Examples
Usage example
damerauLevenshteinDistance
Introduced in: v24.1
Calculates the Damerau-Levenshtein distance between two byte strings.
Syntax
Arguments
Returned value
Returns the Damerau-Levenshtein distance between the two strings. UInt64
Examples
Usage example
decodeHTMLComponent
Introduced in: v23.9
Decodes HTML entities in a string to their corresponding characters.
Syntax
Arguments
- s— String containing HTML entities to decode.- String
Returned value
Returns the string with HTML entities decoded. String
Examples
Usage example
decodeXMLComponent
Introduced in: v21.2
Decodes XML entities in a string to their corresponding characters.
Syntax
Arguments
- s— String containing XML entities to decode.- String
Returned value
Returns the provided string with XML entities decoded. String
Examples
Usage example
editDistance
Introduced in: v23.9
Calculates the edit distance between two byte strings.
Syntax
Arguments
Returned value
Returns the edit distance between the two strings. UInt64
Examples
Usage example
editDistanceUTF8
Introduced in: v24.6
Calculates the edit distance between two UTF8 strings.
Syntax
Arguments
Returned value
Returns the edit distance between the two UTF8 strings. UInt64
Examples
Usage example
encodeXMLComponent
Introduced in: v21.1
Escapes characters to place string into XML text node or attribute.
Syntax
Arguments
- s— String to escape.- String
Returned value
Returns the escaped string. String
Examples
Usage example
endsWith
Introduced in: v1.1
Checks whether a string ends with the provided suffix.
Syntax
Arguments
Returned value
Returns 1 if s ends with suffix, otherwise 0. UInt8
Examples
Usage example
endsWithCaseInsensitive
Introduced in: v25.9
Checks whether a string ends with the provided case-insensitive suffix.
Syntax
Arguments
Returned value
Returns 1 if s ends with case-insensitive suffix, otherwise 0. UInt8
Examples
Usage example
endsWithCaseInsensitiveUTF8
Introduced in: v25.9
Returns whether string s ends with case-insensitive suffix.
Assumes that the string contains valid UTF-8 encoded text.
If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
Arguments
Returned value
Returns 1 if s ends with case-insensitive suffix, otherwise 0. UInt8
Examples
Usage example
endsWithUTF8
Introduced in: v23.8
Returns whether string s ends with suffix.
Assumes that the string contains valid UTF-8 encoded text.
If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
Arguments
Returned value
Returns 1 if s ends with suffix, otherwise 0. UInt8
Examples
Usage example
extractTextFromHTML
Introduced in: v21.3
Extracts text content from HTML or XHTML.
This function removes HTML tags, comments, and script/style elements, leaving only the text content. It handles:
- Removal of all HTML/XML tags
- Removal of comments (<!-- -->)
- Removal of script and style elements with their content
- Processing of CDATA sections (copied verbatim)
- Proper whitespace handling and normalization
Note: HTML entities are not decoded and should be processed with a separate function if needed.
Syntax
Arguments
- html— String containing HTML content to extract text from.- String
Returned value
Returns the extracted text content with normalized whitespace. String
Examples
Usage example
firstLine
Introduced in: v23.7
Returns the first line of a multi-line string.
Syntax
Arguments
- s— Input string.- String
Returned value
Returns the first line of the input string or the whole string if there are no line separators. String
Examples
Usage example
idnaDecode
Introduced in: v24.1
Returns the Unicode (UTF-8) representation (ToUnicode algorithm) of a domain name according to the Internationalized Domain Names in Applications (IDNA) mechanism.
In case of an error (e.g. because the input is invalid), the input string is returned.
Note that repeated application of idnaEncode() and idnaDecode() does not necessarily return the original string due to case normalization.
Syntax
Arguments
- s— Input string.- String
Returned value
Returns a Unicode (UTF-8) representation of the input string according to the IDNA mechanism of the input value. String
Examples
Usage example
idnaEncode
Introduced in: v24.1
Returns the ASCII representation (ToASCII algorithm) of a domain name according to the Internationalized Domain Names in Applications (IDNA) mechanism. The input string must be UTF-encoded and translatable to an ASCII string, otherwise an exception is thrown.
No percent decoding or trimming of tabs, spaces or control characters is performed.
Syntax
Arguments
- s— Input string.- String
Returned value
Returns an ASCII representation of the input string according to the IDNA mechanism of the input value. String
Examples
Usage example
initcap
Introduced in: v23.7
Converts the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.
Because initcap converts only the first letter of each word to upper case you may observe unexpected behaviour for words containing apostrophes or capital letters.
This is a known behaviour and there are no plans to fix it currently.
Syntax
Arguments
- s— Input string.- String
Returned value
Returns s with the first letter of each word converted to upper case. String
Examples
Usage example
Example of known behavior for words containing apostrophes or capital letters
initcapUTF8
Introduced in: v23.7
Like initcap, initcapUTF8 converts the first letter of each word to upper case and the rest to lower case.
Assumes that the string contains valid UTF-8 encoded text.
If this assumption is violated, no exception is thrown and the result is undefined.
This function does not detect the language, e.g. for Turkish the result might not be exactly correct (i/İ vs. i/I). If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result may be incorrect for this code point.
Syntax
Arguments
- s— Input string.- String
Returned value
Returns s with the first letter of each word converted to upper case. String
Examples
Usage example
isValidASCII
Introduced in: v25.9
Returns 1 if the input String or FixedString contains only ASCII bytes (0x00–0x7F), otherwise 0.
Syntax
Arguments
- None. Returned value
Examples
isValidASCII
isValidUTF8
Introduced in: v20.1
Checks if the set of bytes constitutes valid UTF-8-encoded text.
Syntax
Arguments
- s— The string to check for UTF-8 encoded validity.- String
Returned value
Returns 1, if the set of bytes constitutes valid UTF-8-encoded text, otherwise 0. UInt8
Examples
Usage example
jaroSimilarity
Introduced in: v24.1
Calculates the Jaro similarity between two byte strings.
Syntax
Arguments
Returned value
Returns the Jaro similarity between the two strings. Float64
Examples
Usage example
jaroWinklerSimilarity
Introduced in: v24.1
Calculates the Jaro-Winkler similarity between two byte strings.
Syntax
Arguments
Returned value
Returns the Jaro-Winkler similarity between the two strings. Float64
Examples
Usage example
left
Introduced in: v22.1
Returns a substring of string s with a specified offset starting from the left.
Syntax
Arguments
- s— The string to calculate a substring from.- Stringor- FixedString
- offset— The number of bytes of the offset.- (U)Int*
Returned value
Returns:
- For positive offset, a substring ofswithoffsetmany bytes, starting from the left of the string.
- For negative offset, a substring ofswithlength(s) - |offset|bytes, starting from the left of the string.
- An empty string if lengthis0.String
Examples
Positive offset
Negative offset
leftPad
Introduced in: v21.8
Pads a string from the left with spaces or with a specified string (multiple times, if needed) until the resulting string reaches the specified length.
Syntax
Arguments
- string— Input string that should be padded.- String
- length— The length of the resulting string. If the value is smaller than the input string length, then the input string is shortened to- lengthcharacters.- (U)Int*
- pad_string— Optional. The string to pad the input string with. If not specified, then the input string is padded with spaces.- String
Returned value
Returns a left-padded string of the given length. String
Examples
Usage example
leftPadUTF8
Introduced in: v21.8
Pads a UTF8 string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length.
Unlike leftPad which measures the string length in bytes, the string length is measured in code points.
Syntax
Arguments
- string— Input string that should be padded.- String
- length— The length of the resulting string. If the value is smaller than the input string length, then the input string is shortened to- lengthcharacters.- (U)Int*
- pad_string— Optional. The string to pad the input string with. If not specified, then the input string is padded with spaces.- String
Returned value
Returns a left-padded string of the given length. String
Examples
Usage example
leftUTF8
Introduced in: v22.1
Returns a substring of a UTF-8-encoded string s with a specified offset starting from the left.
Syntax
Arguments
- s— The UTF-8 encoded string to calculate a substring from.- Stringor- FixedString
- offset— The number of bytes of the offset.- (U)Int*
Returned value
Returns:
- For positive offset, a substring ofswithoffsetmany bytes, starting from the left of the string.\n"
- For negative offset, a substring ofswithlength(s) - |offset|bytes, starting from the left of the string.\n"
- An empty string if lengthis 0.String
Examples
Positive offset
Negative offset
lengthUTF8
Introduced in: v1.1
Returns the length of a string in Unicode code points rather than in bytes or characters. It assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
Arguments
- s— String containing valid UTF-8 encoded text.- String
Returned value
Length of the string s in Unicode code points. UInt64
Examples
Usage example
lower
Introduced in: v1.1
Converts an ASCII string to lowercase.
Syntax
Arguments
- s— A string to convert to lowercase.- String
Returned value
Returns a lowercase string from s. String
Examples
Usage example
lowerUTF8
Introduced in: v1.1
Converts a string to lowercase, assuming that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
Arguments
- input— Input string to convert to lowercase.- String
Returned value
Returns a lowercase string. String
Examples
first
normalizeUTF8NFC
Introduced in: v21.11
Normalizes a UTF-8 string according to the NFC normalization form.
Syntax
Arguments
- str— UTF-8 encoded input string.- String
Returned value
Returns the NFC normalized form of the UTF-8 string. String
Examples
Usage example
normalizeUTF8NFD
Introduced in: v21.11
Normalizes a UTF-8 string according to the NFD normalization form.
Syntax
Arguments
- str— UTF-8 encoded input string.- String
Returned value
Returns the NFD normalized form of the UTF-8 string. String
Examples
Usage example
normalizeUTF8NFKC
Introduced in: v21.11
Normalizes a UTF-8 string according to the NFKC normalization form.
Syntax
Arguments
- str— UTF-8 encoded input string.- String
Returned value
Returns the NFKC normalized form of the UTF-8 string. String
Examples
Usage example
normalizeUTF8NFKD
Introduced in: v21.11
Normalizes a UTF-8 string according to the NFKD normalization form.
Syntax
Arguments
- str— UTF-8 encoded input string.- String
Returned value
Returns the NFKD normalized form of the UTF-8 string. String
Examples
Usage example
punycodeDecode
Introduced in: v24.1
Returns the UTF8-encoded plaintext of a Punycode-encoded string. If no valid Punycode-encoded string is given, an exception is thrown.
Syntax
Arguments
- s— Punycode-encoded string.- String
Returned value
Returns the plaintext of the input value. String
Examples
Usage example
punycodeEncode
Introduced in: v24.1
Returns the Punycode representation of a string. The string must be UTF8-encoded, otherwise the behavior is undefined.
Syntax
Arguments
- s— Input value.- String
Returned value
Returns a Punycode representation of the input value. String
Examples
Usage example
repeat
Introduced in: v20.1
Concatenates a string as many times with itself as specified.
Syntax
Arguments
Returned value
A string containing string s repeated n times. If n is negative, the function returns the empty string. String
Examples
Usage example
reverseUTF8
Introduced in: v1.1
Reverses a sequence of Unicode code points in a string. Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
Arguments
- s— String containing valid UTF-8 encoded text.- String
Returned value
Returns a string with the sequence of Unicode code points reversed. String
Examples
Usage example
right
Introduced in: v22.1
Returns a substring of string s with a specified offset starting from the right.
Syntax
Arguments
- s— The string to calculate a substring from.- Stringor- FixedString
- offset— The number of bytes of the offset.- (U)Int*
Returned value
Returns:
- For positive offset, a substring ofswithoffsetmany bytes, starting from the right of the string.
- For negative offset, a substring ofswithlength(s) - |offset|bytes, starting from the right of the string.
- An empty string if lengthis0.String
Examples
Positive offset
Negative offset
rightPad
Introduced in: v21.8
Pads a string from the right with spaces or with a specified string (multiple times, if needed) until the resulting string reaches the specified length.
Syntax
Arguments
- string— Input string that should be padded.- String
- length— The length of the resulting string. If the value is smaller than the input string length, then the input string is shortened to- lengthcharacters.- (U)Int*
- pad_string— Optional. The string to pad the input string with. If not specified, then the input string is padded with spaces.- String
Returned value
Returns a right-padded string of the given length. String
Examples
Usage example
rightPadUTF8
Introduced in: v21.8
Pads the string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length.
Unlike rightPad which measures the string length in bytes, the string length is measured in code points.
Syntax
Arguments
- string— Input string that should be padded.- String
- length— The length of the resulting string. If the value is smaller than the input string length, then the input string is shortened to- lengthcharacters.- (U)Int*
- pad_string— Optional. The string to pad the input string with. If not specified, then the input string is padded with spaces.- String
Returned value
Returns a right-padded string of the given length. String
Examples
Usage example
rightUTF8
Introduced in: v22.1
Returns a substring of UTF-8 encoded string s with a specified offset starting from the right.
Syntax
Arguments
- s— The UTF-8 encoded string to calculate a substring from.- Stringor- FixedString
- offset— The number of bytes of the offset.- (U)Int*
Returned value
Returns:
- For positive offset, a substring ofswithoffsetmany bytes, starting from the right of the string.
- For negative offset, a substring ofswithlength(s) - |offset|bytes, starting from the right of the string.
- An empty string if lengthis0.String
Examples
Positive offset
Negative offset
soundex
Introduced in: v23.4
Returns the Soundex code of a string.
Syntax
Arguments
- s— Input string.- String
Returned value
Returns the Soundex code of the input string. String
Examples
Usage example
space
Introduced in: v23.5
Concatenates a space ( ) as many times with itself as specified.
Syntax
Arguments
- n— The number of times to repeat the space.- (U)Int*
Returned value
Returns astring containing a space repeated n times. If n <= 0, the function returns the empty string. String
Examples
Usage example
sparseGrams
Introduced in: v25.5
Finds all substrings of a given string that have a length of at least n,
where the hashes of the (n-1)-grams at the borders of the substring
are strictly greater than those of any (n-1)-gram inside the substring.
Uses CRC32 as a hash function.
Syntax
Arguments
- s— An input string.- String
- min_ngram_length— Optional. The minimum length of extracted ngram. The default and minimal value is 3.- UInt*
- max_ngram_length— Optional. The maximum length of extracted ngram. The default value is 100. Should be not less than- min_ngram_length.- UInt*
Returned value
Returns an array of selected substrings. Array(String)
Examples
Usage example
sparseGramsHashes
Introduced in: v25.5
Finds hashes of all substrings of a given string that have a length of at least n,
where the hashes of the (n-1)-grams at the borders of the substring
are strictly greater than those of any (n-1)-gram inside the substring.
Uses CRC32 as a hash function.
Syntax
Arguments
- s— An input string.- String
- min_ngram_length— Optional. The minimum length of extracted ngram. The default and minimal value is 3.- UInt*
- max_ngram_length— Optional. The maximum length of extracted ngram. The default value is 100. Should be not less than- min_ngram_length.- UInt*
Returned value
Returns an array of selected substrings CRC32 hashes. Array(UInt32)
Examples
Usage example
sparseGramsHashesUTF8
Introduced in: v25.5
Finds hashes of all substrings of a given UTF-8 string that have a length of at least n, where the hashes of the (n-1)-grams at the borders of the substring are strictly greater than those of any (n-1)-gram inside the substring.
Expects UTF-8 string, throws an exception in case of invalid UTF-8 sequence.
Uses CRC32 as a hash function.
Syntax
Arguments
- s— An input string.- String
- min_ngram_length— Optional. The minimum length of extracted ngram. The default and minimal value is 3.- UInt*
- max_ngram_length— Optional. The maximum length of extracted ngram. The default value is 100. Should be not less than- min_ngram_length.- UInt*
Returned value
Returns an array of selected UTF-8 substrings CRC32 hashes. Array(UInt32)
Examples
Usage example
sparseGramsUTF8
Introduced in: v25.5
Finds all substrings of a given UTF-8 string that have a length of at least n, where the hashes of the (n-1)-grams at the borders of the substring are strictly greater than those of any (n-1)-gram inside the substring.
Expects a UTF-8 string, throws an exception in case of an invalid UTF-8 sequence.
Uses CRC32 as a hash function.
Syntax
Arguments
- s— An input string.- String
- min_ngram_length— Optional. The minimum length of extracted ngram. The default and minimal value is 3.- UInt*
- max_ngram_length— Optional. The maximum length of extracted ngram. The default value is 100. Should be not less than- min_ngram_length.- UInt*
Returned value
Returns an array of selected UTF-8 substrings. Array(String)
Examples
Usage example
startsWith
Introduced in: v1.1
Checks whether a string begins with the provided string.
Syntax
Arguments
Returned value
Returns 1 if s starts with prefix, otherwise 0. UInt8
Examples
Usage example
startsWithCaseInsensitive
Introduced in: v25.9
Checks whether a string begins with the provided case-insensitive string.
Syntax
Arguments
Returned value
Returns 1 if s starts with case-insensitive prefix, otherwise 0. UInt8
Examples
Usage example
startsWithCaseInsensitiveUTF8
Introduced in: v25.9
Checks if a string starts with the provided case-insensitive prefix. Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
Arguments
Returned value
Returns 1 if s starts with case-insensitive prefix, otherwise 0. UInt8
Examples
Usage example
startsWithUTF8
Introduced in: v23.8
Checks if a string starts with the provided prefix. Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
Arguments
Returned value
Returns 1 if s starts with prefix, otherwise 0. UInt8
Examples
Usage example
stringBytesEntropy
Introduced in: v25.6
Calculates Shannon's entropy of byte distribution in a string.
Syntax
Arguments
- s— The string to analyze.- String
Returned value
Returns Shannon's entropy of byte distribution in the string. Float64
Examples
Usage example
stringBytesUniq
Introduced in: v25.6
Counts the number of distinct bytes in a string.
Syntax
Arguments
- s— The string to analyze.- String
Returned value
Returns the number of distinct bytes in the string. UInt16
Examples
Usage example
stringJaccardIndex
Introduced in: v23.11
Calculates the Jaccard similarity index between two byte strings.
Syntax
Arguments
Returned value
Returns the Jaccard similarity index between the two strings. Float64
Examples
Usage example
stringJaccardIndexUTF8
Introduced in: v23.11
Like stringJaccardIndex but for UTF8-encoded strings.
Syntax
Arguments
Returned value
Returns the Jaccard similarity index between the two UTF8 strings. Float64
Examples
Usage example
substring
Introduced in: v1.1
Returns the substring of a string s which starts at the specified byte index offset.
Byte counting starts from 1 with the following logic:
- If offsetis0, an empty string is returned.
- If offsetis negative, the substring startsposcharacters from the end of the string, rather than from the beginning.
An optional argument length specifies the maximum number of bytes the returned substring may have.
Syntax
Arguments
- s— The string to calculate a substring from.- Stringor- FixedStringor- Enum
- offset— The starting position of the substring in- s.- (U)Int*
- length— Optional. The maximum length of the substring.- (U)Int*
Returned value
Returns a substring of s with length many bytes, starting at index offset. String
Examples
Basic usage
substringIndex
Introduced in: v23.7
Returns the substring of s before count occurrences of the delimiter delim, as in Spark or MySQL.
Syntax
Arguments
- s— The string to extract substring from.- String
- delim— The character to split.- String
- count— The number of occurrences of the delimiter to count before extracting the substring. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned.- UIntor- Int
Returned value
Returns a substring of s before count occurrences of delim. String
Examples
Usage example
substringIndexUTF8
Introduced in: v23.7
Returns the substring of s before count occurrences of the delimiter delim, specifically for Unicode code points.
Assumes that the string contains valid UTF-8 encoded text.
If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
Arguments
- s— The string to extract substring from.- String
- delim— The character to split.- String
- count— The number of occurrences of the delimiter to count before extracting the substring. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned.- UIntor- Int
Returned value
Returns a substring of s before count occurrences of delim. String
Examples
UTF8 example
substringUTF8
Introduced in: v1.1
Returns the substring of a string s which starts at the specified byte index offset for Unicode code points.
Byte counting starts from 1 with the following logic:
- If offsetis0, an empty string is returned.
- If offsetis negative, the substring startsposcharacters from the end of the string, rather than from the beginning.
An optional argument length specifies the maximum number of bytes the returned substring may have.
This function assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
Arguments
- s— The string to calculate a substring from.- Stringor- FixedStringor- Enum
- offset— The starting position of the substring in- s.- Intor- UInt
- length— The maximum length of the substring. Optional.- Intor- UInt
Returned value
Returns a substring of s with length many bytes, starting at index offset. String
Examples
Usage example
toValidUTF8
Introduced in: v20.1
Converts a string to valid UTF-8 encoding by replacing any invalid UTF-8 characters with the replacement character � (U+FFFD).
When multiple consecutive invalid characters are found, they are collapsed into a single replacement character.
Syntax
Arguments
- s— Any set of bytes represented as the String data type object.- String
Returned value
Returns a valid UTF-8 string. String
Examples
Usage example
trimBoth
Introduced in: v20.1
Removes the specified characters from the start and end of a string. By default, removes common whitespace (ASCII) characters.
Syntax
Arguments
- s— String to trim.- String
- trim_characters— Optional. Characters to trim. If not specified, common whitespace characters are removed.- String
Returned value
Returns the string with specified characters trimmed from both ends. String
Examples
Usage example
trimLeft
Introduced in: v20.1
Removes the specified characters from the start of a string. By default, removes common whitespace (ASCII) characters.
Syntax
Arguments
- input— String to trim.- String
- trim_characters— Optional. Characters to trim. If not specified, common whitespace characters are removed.- String
Returned value
Returns the string with specified characters trimmed from the left. String
Examples
Usage example
trimRight
Introduced in: v20.1
Removes the specified characters from the end of a string. By default, removes common whitespace (ASCII) characters.
Syntax
Arguments
- s— String to trim.- String
- trim_characters— Optional characters to trim. If not specified, common whitespace characters are removed.- String
Returned value
Returns the string with specified characters trimmed from the right. String
Examples
Usage example
tryBase32Decode
Introduced in: v25.6
Accepts a string and decodes it using Base32 encoding scheme.
Syntax
Arguments
- encoded— String column or constant to decode. If the string is not valid Base32-encoded, returns an empty string in case of error.- String
Returned value
Returns a string containing the decoded value of the argument. String
Examples
Usage example
tryBase58Decode
Introduced in: v22.10
Like base58Decode, but returns an empty string in case of error.
Syntax
Arguments
- encoded— String column or constant. If the string is not valid Base58-encoded, returns an empty string in case of error.- String
Returned value
Returns a string containing the decoded value of the argument. String
Examples
Usage example
tryBase64Decode
Introduced in: v18.16
Like base64Decode, but returns an empty string in case of error.
Syntax
Arguments
- encoded— String column or constant to decode. If the string is not valid Base64-encoded, returns an empty string in case of error.- String
Returned value
Returns a string containing the decoded value of the argument. String
Examples
Usage example
tryBase64URLDecode
Introduced in: v18.16
Like base64URLDecode, but returns an empty string in case of error.
Syntax
Arguments
- encoded— String column or constant to decode. If the string is not valid Base64-encoded, returns an empty string in case of error.- String
Returned value
Returns a string containing the decoded value of the argument. String
Examples
Usage example
tryIdnaEncode
Introduced in: v24.1
Returns the Unicode (UTF-8) representation (ToUnicode algorithm) of a domain name according to the Internationalized Domain Names in Applications (IDNA) mechanism. In case of an error it returns an empty string instead of throwing an exception.
Syntax
Arguments
- s— Input string.- String
Returned value
Returns an ASCII representation of the input string according to the IDNA mechanism of the input value, or empty string if input is invalid. String
Examples
Usage example
tryPunycodeDecode
Introduced in: v24.1
Like punycodeDecode but returns an empty string if no valid Punycode-encoded string is given.
Syntax
Arguments
- s— Punycode-encoded string.- String
Returned value
Returns the plaintext of the input value, or empty string if input is invalid. String
Examples
Usage example
upper
Introduced in: v1.1
Converts the ASCII Latin symbols in a string to uppercase.
Syntax
Arguments
- s— The string to convert to uppercase.- String
Returned value
Returns an uppercase string from s. String
Examples
Usage example
upperUTF8
Introduced in: v1.1
Converts a string to uppercase, assuming that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
This function doesn't detect the language, e.g. for Turkish the result might not be exactly correct (i/İ vs. i/I).
If the length of the UTF-8 byte sequence is different for upper and lower case of a code point (such as ẞ and ß), the result may be incorrect for that code point.
Syntax
Arguments
- s— A string type.- String
Returned value
A String data type value. String
Examples
Usage example
