diff --git a/proposals/0000-lexical-scope.md b/proposals/0000-lexical-scope.md new file mode 100644 index 0000000..037b7a2 --- /dev/null +++ b/proposals/0000-lexical-scope.md @@ -0,0 +1,286 @@ +# Lexical Scoping + +- JEP: (leave blank) +- Author: @jamesls +- Created: 2023-03-21 + +## Abstract +[abstract]: #abstract + +This JEP proposes the introduction of lexical scoping through a new +`let` expression. You can now bind variables that are evaluated in the +context of a given lexical scope. This enables queries that can refer to +elements defined outside of their current scope, which is not currently +possible. This JEP supercedes JEP 11, which proposed similar functionality +through a `let()` function. + +## Motivation +[motivation]: #motivation + +A JMESPath expression is always evaluated in the context of a current +element, which can be explicitly referred to via the `@` token. The +current element changes as expressions are evaluated. For example, +suppose we had the expression `foo.bar[0]` that we want to evalute against +an input document of: + +```json +{"foo": {"bar": ["hello", "world"]}, "baz": "baz"} +``` + +The expression, and the associated current element are evaluated as follows: + +``` +# Start +expression = foo.bar[0] +@ = {"foo": {"bar": ["hello", "world"]}, "baz": "baz"} + +# Step 1 +expression = foo +@ = {"foo": {"bar": ["hello", "world"]}, "baz": "baz"} +result = {"bar": ["hello", "world"]} + +# Step 2 +expression = bar +@ = {"bar": ["hello", "world"]} +result = ["hello", "world"] + +# Step 3 +expression = [0] +@ = ["hello", "world"] +result = "hello" +``` + +The end result of evaluating this expression is `"hello"`. Note that each +step changes that values that are accessible to the current expression being +evaluated. In "Step 2", it is not possible for the expression to reference +the value of `"baz"` in the current element of the previous step, "Step 1". + +This ability to reference variables in a parent scope is a serious limitation +of JMESPath, and anecdotally is one of the commonly requested features +of the language. Below are examples of input documents and the desired output +documents that aren't possible to create with the current version of +JMESPath: + +``` +Input: + +[ + {"home_state": "WA", + "states": [ + {"name": "WA", "cities": ["Seattle", "Bellevue", "Olympia"]}, + {"name": "CA", "cities": ["Los Angeles", "San Francisco"]}, + {"name": "NY", "cities": ["New York City", "Albany"]} + ] + }, + {"home_state": "NY", + "states": [ + {"name": "WA", "cities": ["Seattle", "Bellevue", "Olympia"]}, + {"name": "CA", "cities": ["Los Angeles", "San Francisco"]}, + {"name": "NY", "cities": ["New York City", "Albany"]} + ] + } +] + + +(for each list in "states", select the list of cities associated + with the state defined in the "home_state" key) + +Output: + +[ + ["Seattle", "Bellevue", "Olympia"], + ["New York City", "Albany"] +] +``` + +``` +Input: +{"imageDetails": [ + { + "repositoryName": "org/first-repo", + "imageTags": ["latest", "v1.0", "v1.2"], + "imageDigest": "sha256:abcd" + }, + { + "repositoryName": "org/second-repo", + "imageTags": ["v2.0", "v2.2"], + "imageDigest": "sha256:efgh" + }, +]} + + +(create a list of pairs containing an image tag and its associated repo name) + +Output: + +[ + ["latest", "org/first-repo"], + ["v1.0", "org/first-repo"], + ["v1.2", "org/first-repo"], + ["v2.0", "org/second-repo"], + ["v2.2", "org/second-repo"], +] +``` + +In order to support these queries we need some way for an expression to +reference values that exist outside of its implicit current element. + + +## Specification +[specification]: #specification + +A new "let expression" is added to the language. The expression has the +format: `let in `. The updated grammar rules in ABNF are: + +``` +let-expression = "let" bindings "in" expression +bindings = variable-binding *( "," variable-binding ) +variable-binding = variable-ref "=" expression +variable-ref = "$" unquoted-string +``` + +The `let-expression` and `variable-ref` rule are also added as a new expression +types: + +``` +expression =/ let-expression / variable-ref +``` + +Examples of this new syntax: + +* `let $foo = bar in {a: myvar, b: $foo}` +* `let $foo = baz[0] in bar[? baz == $foo ] | [0]` +* `let $a = b, $c = d in bar[*].[$a, $c, foo, bar]` + +### New evaluation rules + +Let expressions are evaluated as follows. + +Given the rule `"let" bindings "in" expression`, the `bindings` rule is +processed first. Each `variable-binding` within the `bindings` rule defines +the name of a variable and an expression. Each expression is evaluated, and the +result of this evaluation is then bound to the associated variable name. + +Once all the `variable-binding` rules have been processed, the associated +`expression` clause of the let expression is then evaluated. During the +evaluation of the expression, any references, via the `variable-ref` rule, to a +variable name will evaluate to the value bound to the variable. Once the +associated expression has been evaluated, the let expression itself evaluates +to the result of this expression. After the let expression has been evaluated, +the variable bindings associated with the let expression are now longer valid. +This is also referred to as the visibility of a binding; the bindings of a +let expression are only visible during the evaluation of the `expression` +clause of the let expression. + +When evaluating the `bindings` rule, a `variable-binding` for a variable name +that is already visible in the current scope will replace the existing binding +when evaluating the `expression` clause of the let expression. This means in +the context of nested let expressions (and consequently nested scopes), a +variable in an inner scope can shadow a variable defined in an outer scope. + +If a `variable-ref` references a variable that has not been defined, the +evaluation of that `variable-ref` will trigger an `undefined-variable` error. +This error MUST occur when the expression is evaluated and not at compile +time. This is to enable implementations to define an implementation specific +mechanism for defining an initial or "global" scope. Implementations are free +to offer a "strict" compilation mode that a user can opt into, but MUST support +triggering an `undefined-variable` error only when the `variable-ref` is +evaluated. + +### Examples + +Basic examples demonstrating core functionality. + +``` +search(let $foo = foo in $foo, {"foo": "bar"}) -> "bar" +search(let $foo = foo.bar in $foo, {"foo": {"bar": "baz"}}) -> "baz" +search(let $foo = foo in [$foo, $foo], {"foo": "bar"}) -> ["bar", "bar"] +``` + +Nested bindings. + +``` +search( + let $a = a + in + b[*].[a, $a, let $a = 'shadow' in $a], + {"a": "topval", "b": [{"a": "inner1"}, {"a": "inner2"}]} +) -> [["inner1", "topval", "shadow"], ["inner2", "topval", "shadow"]] +``` + +Errors cases. + +``` +search($foo, {}) -> +search([let $foo = 'bar' in $foo, $foo], {}) -> +``` + + +## Rationale +[rationale]: #rationale + +The let expression proposed in this JEP is based off of similar constructs +in existing programming languages: + + +* Haskell: http://learnyouahaskell.com/syntax-in-functions#let-it-be +* Clojure: https://clojuredocs.org/clojure.core/let +* OCaml: https://v2.ocaml.org/manual/expr.html#sss:expr-localdef + +It's important to use syntax and semantics that are already familiar to +developers. We are introducing lexical scoping, which is not a novel +concept, into the language, so care was taken to be consistent with +the mental model that developers already have. + + +## Testcases +[testcases]: #testcases + +Basic expressions + +```yaml +# Basic expressions +- given: + foo: + bar: baz + cases: + - expression: "let $foo = foo in $foo" + result: + bar: baz + - expression: "let $foo = foo.bar in $foo" + result: "baz" + - expression: "let $foo = foo.bar in [$foo, $foo]" + result: ["baz", "baz"] + - command: "Multiple assignments" + expression: "let $foo = 'foo', $bar = 'bar' in [$foo, $bar]" + result: ["foo", "bar"] +# Nested expressions +- given: + a: topval + b: + - a: inner1 + - a: inner2 + cases: + - expression: "let $a = a in b[*].[a, $a, let $a = 'shadow' in $a]" + result: + - ["inner1", "topval", "shadow"] + - ["inner2", "topval", "shadow"] + - comment: Bindings only visible within expression clause + expression: "let $a = 'top-a' in let $a = 'in-a', $b = $a in $b" + result: "top-a" +# Examples from Motivation section +- given: + - home_state: WA + states: + - name: WA + cities: ["Seattle", "Bellevue", "Olympia"] + - name: CA + cities: ["Los Angeles", "San Francisco"] + - name: NY + cities: ["New York City", "Albany"] + cases: + - expression: "[*].[? let $home_state = home_state in name == $home_state].cities" + result: + - ["Seattle", "Bellevue", "Olympia"] + - ["New York City", "Albany"] +```