Skip to content

Latest commit

 

History

History
286 lines (202 loc) · 13.7 KB

jep-014-string-functions.md

File metadata and controls

286 lines (202 loc) · 13.7 KB

String Functions

JEP 14
Author Maxime Labelle, Chris Armstrong (GorillaStack), Richard Gibson
Created 13-October-2022
SemVer MINOR
Status accepted

Abstract

This JEP introduces a core set of useful string manipulation functions. Those functions are modeled from functions found in popular programming languages such as JavaScript and Python.

Specification

Some string manipulation functions bring the new concept of optional arguments to JMESPath functions. The specification paragraph on function evaluation must thus be changed accordingly – highlighted in bold in the text below:

Functions can either have a specific arity, a range of valid – minimum and maximum – number of arguments or be variadic with a minimum number of arguments. If a function-expression is encountered where the arity does not match or the minimum number of arguments for a variadic function is not provided, then implementations must indicate to the caller that an invalid-arity error occurred. How and when this error is raised is implementation specific.

Some functions accept number arguments which are further constrained to integers or even non-negative integers. This JEP specifies a new error type invalid-value by updating the paragraph on type constraints from the specification like so:

Each function signature declares the types of its input parameters. If any type constraints are not met, implementations must indicate that an invalid-type error occurred. If a function parameter accepts values constrained to a specific subset of a type and those constraints are not met, implementations must report that an invalid-value error occurred. How and when those errors are raised is implementation specific.

find_first

int find_first(string $subject, string $sub[, int $start[, int $end]])

Given the $subject string, find_first() returns the zero-based index of the first occurrence where the $sub substring appears in $subject or null if it does not appear. If either the $subject or the $sub argument is an empty string, find_first() returns null.

The $start and $end parameters are optional and allow restricting to the slice [$start:$end] the range within $subject in which $sub must be found.

  • If $start is omitted, it defaults to 0 (which is the start of the $subject string).
  • If $end is omitted, it defaults to length(subject) (which is past the end of the $subject string).

If not omitted, the $start or $end arguments are expected to be integers. Otherwise, an error MUST be raised.

Contrary to similar functions found in most popular programming languages, the find_first() function does not return -1 if no occurrence of the substring can be found. Instead, it returns null for consistency reasons with how JMESPath behaves.

Examples

Given Expression Result
"subject string" find_first(@, 'string') 8
"subject string" find_first(@, 'string', `0`) 8
"subject string" find_first(@, 'string', `0`, `14`) 8
"subject string" find_first(@, 'string', `-99`, `100`) 8
"subject string" find_first(@, 'string', `-6`) 8
"subject string" find_first(@, 'string', `0`, `13`) null
"subject string" find_first(@, 'string', `8`) 8
"subject string" find_first(@, 'string', `8`, `11`) null
"subject string" find_first(@, 'string', `9`) null
"subject string" find_first(@, 's') 0
"subject string" find_first(@, 's', `1`) 8
"subject string" find_first(@, '') null

find_last

int find_last(string $subject, string $sub[, int $start[, int $end]])

Given the $subject string, find_last() returns the zero-based index of the last occurrence where the $sub substring appears in $subject or null if it does not appear. If either the $subject or the $sub argument is an empty string, find_last() returns null.

The $start and $end parameters are optional and allow restricting to the slice [$start:$end] the range within $subject in which $sub must be found.

  • If $start is omitted, it defaults to 0 (which is the start of the $subject string).
  • If $end is omitted, it defaults to length(subject) (which is past the end of the $subject string).

If not omitted, the $start or $end arguments are expected to be integers. Otherwise, an error MUST be raised.

Contrary to similar functions found in most popular programming languages, the find_last() function does not return -1 if no occurrence of the substring can be found. Instead, it returns null for consistency reasons with how JMESPath behaves.

Examples

Given Expression Result
"subject string" find_last(@, 'string') 8
"subject string" find_last(@, 'string', `8`) 8
"subject string" find_last(@, 'string', `8`, `9`) null
"subject string" find_last(@, 'string', `9`) null
"subject string" find_last(@, 's') 8
"subject string" find_last(@, 's', `1`) 8
"subject string" find_last(@, 's', `0`, `7`) 0
"subject string" find_last(@, '') null

lower

string lower(string $subject)

Returns the lowercase $subject string using Unicode default casing conversion specification.

Examples

Given Expression Result
"STRING" lower(@) "string"

pad_left

string pad_left(string $subject, number $width[, string $pad])

Given the $subject string, pad_left() adds characters to the beginning and returns a string of length at least $width.

The $pad optional string parameter specifies the padding character. If omitted, it defaults to an ASCII space (U+0020). If present, it MUST have length 1, otherwise an error MUST be raised.

If the $subject string has length greater than or equal to $width, it is returned unmodified.

If $width is not an integer or is negative, an error MUST be raised.

Examples

Given Expression Result
"string" pad_left(@, `0`) "string"
"string" pad_left(@, `5`) "string"
"string" pad_left(@, `10`) "    string"
"string" pad_left(@, `10`, '-') "----string"

pad_right

string pad_right(string $subject, number $width[, string $pad])

Given the $subject string, pad_right() adds characters to the end and returns a string of length at least $width.

The $pad optional string parameter specifies the padding character. If omitted, it defaults to an ASCII space (U+0020). If present, it MUST have length 1, otherwise an error MUST be raised.

If the $subject string has length greater than or equal to $width, it is returned unmodified.

If $width is not an integer or is negative, an error MUST be raised.

Examples

Given Expression Result
"string" pad_right(@, `0`) "string"
"string" pad_right(@, `5`) "string"
"string" pad_right(@, `10`) "string    "
"string" pad_right(@, `10`, '-') "string----"

replace

string replace(string $subject, string $old, string $new[, number $count])

Given the $subject string, replace() replaces occurrences of the $old substring with the $new substring.

The $count optional integer specifies how many occurrences of the $old substring in $subject are replaced. If this parameter is omitted, all occurrences are replaced. If $count is not an integer or is negative, an error MUST be raised.

The replace() function has no effect if $count is 0.

Examples

Given Expression Result
"aabaaabaaaab" replace(@, 'aa', '-', `0`) "aabaaabaaaab"
"aabaaabaaaab" replace(@, 'aa', '-', `1`) "-baaabaaaab"
"aabaaabaaaab" replace(@, 'aa', '-', `2`) "-b-abaaaab"
"aabaaabaaaab" replace(@, 'aa', '-', `3`) "-b-ab-aab"
"aabaaabaaaab" replace(@, 'aa', '-') "-b-ab--b"

split

array[string] split(string $subject, string $search[, number $count])

Given the $subject string, split() breaks on occurrences of the string $search and returns an array.

The split() function returns an array containing each partial string between occurrences of $search. If $subject contains no occurrences of the $search string, an array containing just the original $subject string will be returned.

If the $search argument is an empty string, split() breaks on every character and returns an array containing each character from the $subject string. Thus, if $subject is also an empty string, split() returns an empty array.

The $count optional integer specifies the maximum number of split points within the $search string. If this parameter is omitted, all occurrences are split. If $count is not an integer or is negative, an error MUST be raised.

If $count is equal to 0, split() returns an array containing a single element, the $subject string.

Otherwise, the split() function breaks on occurrences of the $search string up to $count times. The last string in the resulting array containing the remaining contents of $subject unmodified.

Note: The split() function was originally designed by Chris Armstrong. However, its behavior has been slightly altered for consistency reasons.

Examples

Expression Result
split('', '') []
split('all chars', '') [ "a", "l", "l", " ", "c", "h", "a", "r", "s" ]
split('/', '/') [ "", "" ]
split('average|-|min|-|max|-|mean|-|median', '|-|') [ "average", "min", "max", "mean", "median" ]
split('average|-|min|-|max|-|mean|-|median', '|-|', `3`) [ "average", "min", "max", "mean|-|median" ]
split('average|-|min|-|max|-|mean|-|median', '|-|', `2`) [ "average", "min", "max|-|mean|-|median" ]
split('average|-|min|-|max|-|mean|-|median', '|-|', `1`) [ "average", "min|-|max|-|mean|-|median" ]
split('average|-|min|-|max|-|mean|-|median', '|-|', `0`) [ "average|-|min|-|max|-|mean|-|median" ]
split('average|-|min|-|max|-|mean|-|median', '-') [ "average|", "|min|", "|max|", "|mean|", "|median" ]

Specification

trim

string trim(string $subject[, string $chars])

Given the $subject string, trim() removes the leading and trailing characters found in $chars.

The $chars optional string parameter represents a set of characters to be removed. If this parameter is not specified, or is an empty string, whitespace characters are removed from the $subject string. Whitespaces are defined by the Unicode standard as codepoints having the White_Space property set to Yes.

Examples

Given Expression Result
" subject string " trim(@) "subject string"
" subject string " trim(@, '') "subject string"
" subject string " trim(@, ' ') "subject string"
" subject string " trim(@, 's') " subject string "
" subject string " trim(@, 'su') " subject string "
" subject string " trim(@, 'su ') "bject string"
" subject string " trim(@, 'gsu ') "bject strin"

trim_left

string trim_left(string $subject[, string $chars])

Given the $subject string, trim_left() removes the leading characters found in $chars.

Like for the trim() function, the $chars optional string parameter represents a set of characters to be removed. trim_left() defaults to removing whitespace characters if $chars is not specified or is an empty string.

Examples

Given Expression Result
" subject string " trim_left(@) "subject string "
" subject string " trim_left(@, 's') " subject string "
" subject string " trim_left(@, 'su') " subject string "
" subject string " trim_left(@, 'su ') "bject string "
" subject string " trim_left(@, 'gsu ') "bject string "

trim_right

string trim_right(string $subject[, string $chars])

Given the $subject string, trim_right() removes the trailing characters found in $chars.

Like for the trim() and trim_left() functions, the $chars optional string parameter represents a set of characters to be removed. trim_right() defaults to removing whitespace characters if $chars is not specified or is an empty string.

Examples

Given Expression Result
" subject string " trim_right(@) " subject string"
" subject string " trim_right(@, 's') " subject string "
" subject string " trim_right(@, 'su') " subject string "
" subject string " trim_right(@, 'su ') " subject string"
" subject string " trim_right(@, 'gsu ') " subject strin"

upper

string upper(string $subject)

Returns the uppercase $subject string using Unicode default casing conversion specification.

Given Expression Result
"string" upper(@) "STRING"

Compliance

A new string_functions.json file will be added to the compliance tests. The test suite will introduce the following new error type:

  • invalid-value

This error type would be raised by split() for instance, if its $count parameter is negative or not an integer.