JEP | 14 |
Author | Maxime Labelle, Chris Armstrong (GorillaStack), Richard Gibson |
Created | 13-October-2022 |
SemVer | MINOR |
Status | accepted |
This JEP introduces a core set of useful string manipulation functions. Those functions are modeled from functions found in popular programming languages such as JavaScript and Python.
Some string manipulation functions bring the new concept of optional arguments to JMESPath functions. The specification paragraph on function evaluation must thus be changed accordingly – highlighted in bold in the text below:
Functions can either have a specific arity, a range of valid – minimum and maximum – number of arguments or be variadic with a minimum number of arguments. If a function-expression is encountered where the arity does not match or the minimum number of arguments for a variadic function is not provided, then implementations must indicate to the caller that an invalid-arity error occurred. How and when this error is raised is implementation specific.
Some functions accept number arguments which are further constrained to integers or even non-negative integers. This JEP specifies a new error
type invalid-value
by updating the paragraph on type constraints from the specification like so:
Each function signature declares the types of its input parameters. If any type constraints are not met, implementations must indicate that an invalid-type
error occurred. If a function parameter accepts values constrained to a specific subset of a type and those constraints are not met, implementations must report that an invalid-value
error occurred. How and when those errors are raised is implementation specific.
int find_first(string $subject, string $sub[, int $start[, int $end]])
Given the $subject
string, find_first()
returns the zero-based index of the first occurrence where the $sub
substring appears in $subject
or null
if it does not appear. If either the $subject
or the $sub
argument is an empty string, find_first()
returns null
.
The $start
and $end
parameters are optional and allow restricting to the slice [$start:$end]
the range within $subject
in which $sub
must be found.
- If
$start
is omitted, it defaults to0
(which is the start of the$subject
string). - If
$end
is omitted, it defaults tolength(subject)
(which is past the end of the$subject
string).
If not omitted, the $start
or $end
arguments are expected to be integers. Otherwise, an error MUST be raised.
Contrary to similar functions found in most popular programming languages, the find_first()
function does not return -1
if no occurrence of the substring can be found. Instead, it returns null
for consistency reasons with how JMESPath behaves.
Given | Expression | Result |
---|---|---|
"subject string" |
find_first(@, 'string') |
8 |
"subject string" |
find_first(@, 'string', `0`) |
8 |
"subject string" |
find_first(@, 'string', `0`, `14`) |
8 |
"subject string" |
find_first(@, 'string', `-99`, `100`) |
8 |
"subject string" |
find_first(@, 'string', `-6`) |
8 |
"subject string" |
find_first(@, 'string', `0`, `13`) |
null |
"subject string" |
find_first(@, 'string', `8`) |
8 |
"subject string" |
find_first(@, 'string', `8`, `11`) |
null |
"subject string" |
find_first(@, 'string', `9`) |
null |
"subject string" |
find_first(@, 's') |
0 |
"subject string" |
find_first(@, 's', `1`) |
8 |
"subject string" |
find_first(@, '') |
null |
int find_last(string $subject, string $sub[, int $start[, int $end]])
Given the $subject
string, find_last()
returns the zero-based index of the last occurrence where the $sub
substring appears in $subject
or null
if it does not appear. If either the $subject
or the $sub
argument is an empty string, find_last()
returns null
.
The $start
and $end
parameters are optional and allow restricting to the slice [$start:$end]
the range within $subject
in which $sub
must be found.
- If
$start
is omitted, it defaults to0
(which is the start of the$subject
string). - If
$end
is omitted, it defaults tolength(subject)
(which is past the end of the$subject
string).
If not omitted, the $start
or $end
arguments are expected to be integers. Otherwise, an error MUST be raised.
Contrary to similar functions found in most popular programming languages, the find_last()
function does not return -1
if no occurrence of the substring can be found. Instead, it returns null
for consistency reasons with how JMESPath behaves.
Given | Expression | Result |
---|---|---|
"subject string" |
find_last(@, 'string') |
8 |
"subject string" |
find_last(@, 'string', `8`) |
8 |
"subject string" |
find_last(@, 'string', `8`, `9`) |
null |
"subject string" |
find_last(@, 'string', `9`) |
null |
"subject string" |
find_last(@, 's') |
8 |
"subject string" |
find_last(@, 's', `1`) |
8 |
"subject string" |
find_last(@, 's', `0`, `7`) |
0 |
"subject string" |
find_last(@, '') |
null |
string lower(string $subject)
Returns the lowercase $subject
string using Unicode default casing conversion specification.
Given | Expression | Result |
---|---|---|
"STRING" |
lower(@) |
"string" |
string pad_left(string $subject, number $width[, string $pad])
Given the $subject
string, pad_left()
adds characters to the beginning and returns a string of length at least $width
.
The $pad
optional string parameter specifies the padding character.
If omitted, it defaults to an ASCII space (U+0020).
If present, it MUST have length 1, otherwise an error MUST be raised.
If the $subject
string has length greater than or equal to $width
, it is returned unmodified.
If $width
is not an integer or is negative, an error MUST be raised.
Given | Expression | Result |
---|---|---|
"string" |
pad_left(@, `0`) |
"string" |
"string" |
pad_left(@, `5`) |
"string" |
"string" |
pad_left(@, `10`) |
" string" |
"string" |
pad_left(@, `10`, '-') |
"----string" |
string pad_right(string $subject, number $width[, string $pad])
Given the $subject
string, pad_right()
adds characters to the end and returns a string of length at least $width
.
The $pad
optional string parameter specifies the padding character.
If omitted, it defaults to an ASCII space (U+0020).
If present, it MUST have length 1, otherwise an error MUST be raised.
If the $subject
string has length greater than or equal to $width
, it is returned unmodified.
If $width
is not an integer or is negative, an error MUST be raised.
Given | Expression | Result |
---|---|---|
"string" |
pad_right(@, `0`) |
"string" |
"string" |
pad_right(@, `5`) |
"string" |
"string" |
pad_right(@, `10`) |
"string " |
"string" |
pad_right(@, `10`, '-') |
"string----" |
string replace(string $subject, string $old, string $new[, number $count])
Given the $subject
string, replace()
replaces occurrences of the $old
substring with the $new
substring.
The $count
optional integer specifies how many occurrences of the $old
substring in $subject
are replaced. If this parameter is omitted, all occurrences are replaced. If $count
is not an integer or is negative, an error MUST be raised.
The replace()
function has no effect if $count
is 0
.
Given | Expression | Result |
---|---|---|
"aabaaabaaaab" |
replace(@, 'aa', '-', `0`) |
"aabaaabaaaab" |
"aabaaabaaaab" |
replace(@, 'aa', '-', `1`) |
"-baaabaaaab" |
"aabaaabaaaab" |
replace(@, 'aa', '-', `2`) |
"-b-abaaaab" |
"aabaaabaaaab" |
replace(@, 'aa', '-', `3`) |
"-b-ab-aab" |
"aabaaabaaaab" |
replace(@, 'aa', '-') |
"-b-ab--b" |
array[string] split(string $subject, string $search[, number $count])
Given the $subject
string, split()
breaks on occurrences of the string $search
and returns an array.
The split()
function returns an array containing each partial string between occurrences of $search
. If $subject
contains no occurrences of the $search
string, an array containing just the original $subject
string will be returned.
If the $search
argument is an empty string, split()
breaks on every character and returns an array containing each character from the $subject
string. Thus, if $subject
is also an empty string, split()
returns an empty array.
The $count
optional integer specifies the maximum number of split points within the $search
string.
If this parameter is omitted, all occurrences are split. If $count
is not an integer or is negative, an error MUST be raised.
If $count
is equal to 0
, split()
returns an array containing a single element, the $subject
string.
Otherwise, the split()
function breaks on occurrences of the $search
string up to $count
times. The last string in the resulting array containing the remaining contents of $subject
unmodified.
Note: The split()
function was originally designed by Chris Armstrong. However, its behavior has been slightly altered for consistency reasons.
Expression | Result |
---|---|
split('', '') |
[] |
split('all chars', '') |
[ "a", "l", "l", " ", "c", "h", "a", "r", "s" ] |
split('/', '/') |
[ "", "" ] |
split('average|-|min|-|max|-|mean|-|median', '|-|') |
[ "average", "min", "max", "mean", "median" ] |
split('average|-|min|-|max|-|mean|-|median', '|-|', `3`) |
[ "average", "min", "max", "mean|-|median" ] |
split('average|-|min|-|max|-|mean|-|median', '|-|', `2`) |
[ "average", "min", "max|-|mean|-|median" ] |
split('average|-|min|-|max|-|mean|-|median', '|-|', `1`) |
[ "average", "min|-|max|-|mean|-|median" ] |
split('average|-|min|-|max|-|mean|-|median', '|-|', `0`) |
[ "average|-|min|-|max|-|mean|-|median" ] |
split('average|-|min|-|max|-|mean|-|median', '-') |
[ "average|", "|min|", "|max|", "|mean|", "|median" ] |
string trim(string $subject[, string $chars])
Given the $subject
string, trim()
removes the leading and trailing characters found in $chars
.
The $chars
optional string parameter represents a set of characters to be removed. If this parameter is not specified, or is an empty string, whitespace characters are removed from the $subject
string. Whitespaces are defined by the Unicode standard as codepoints having the White_Space
property set to Yes
.
Given | Expression | Result |
---|---|---|
" subject string " |
trim(@) |
"subject string" |
" subject string " |
trim(@, '') |
"subject string" |
" subject string " |
trim(@, ' ') |
"subject string" |
" subject string " |
trim(@, 's') |
" subject string " |
" subject string " |
trim(@, 'su') |
" subject string " |
" subject string " |
trim(@, 'su ') |
"bject string" |
" subject string " |
trim(@, 'gsu ') |
"bject strin" |
string trim_left(string $subject[, string $chars])
Given the $subject
string, trim_left()
removes the leading characters found in $chars
.
Like for the trim()
function, the $chars
optional string parameter represents a set of characters to be removed. trim_left()
defaults to removing whitespace characters if $chars
is not specified or is an empty string.
Given | Expression | Result |
---|---|---|
" subject string " |
trim_left(@) |
"subject string " |
" subject string " |
trim_left(@, 's') |
" subject string " |
" subject string " |
trim_left(@, 'su') |
" subject string " |
" subject string " |
trim_left(@, 'su ') |
"bject string " |
" subject string " |
trim_left(@, 'gsu ') |
"bject string " |
string trim_right(string $subject[, string $chars])
Given the $subject
string, trim_right()
removes the trailing characters found in $chars
.
Like for the trim()
and trim_left()
functions, the $chars
optional string parameter represents a set of characters to be removed. trim_right()
defaults to removing whitespace characters if $chars
is not specified or is an empty string.
Given | Expression | Result |
---|---|---|
" subject string " |
trim_right(@) |
" subject string" |
" subject string " |
trim_right(@, 's') |
" subject string " |
" subject string " |
trim_right(@, 'su') |
" subject string " |
" subject string " |
trim_right(@, 'su ') |
" subject string" |
" subject string " |
trim_right(@, 'gsu ') |
" subject strin" |
string upper(string $subject)
Returns the uppercase $subject
string using Unicode default casing conversion specification.
Given | Expression | Result |
---|---|---|
"string" |
upper(@) |
"STRING" |
A new string_functions.json
file will be added to the compliance tests.
The test suite will introduce the following new error type:
- invalid-value
This error type would be raised by split()
for instance, if its $count
parameter is negative or not an integer.