What next for JEP-11 and beyond? #161

springcomp · 2023-03-21T21:09:25Z

springcomp
Mar 21, 2023
Maintainer

Lexical Scoping (revisited)

After a few months with several·implementations·currently·running JEP-11 some concerns are being raised.

The main concern is around the notion of scopes that act as a fallback when evaluating identifiers to null.

In particular, the following expression is judged problematic:

search( let({qux: 'qux'}, &foo.qux ), {"foo": {"bar": "baz"}} ) -> "qux"

The intuitive behaviour would be for foo.qux to return null. However, as a fallback, the qux identifier is not found in the execution context and looked up in the scope set as the first argument to the let() function.

This concern was actually raised early on in the design of JEP-11 by James and that may be one of the main reasons why JEP-11 was never officially accepted.

This concern is what prompted a new discussion and a potential new proposal to replace and improve on JEP-11.

This post is an attempt to summarize the concerns that we have with JEP-11 and try to list various alternatives with their pros and cons.

Glossary

First, let’s agree on common terms so that everyone discussing alternatives can be on the same page.

JEP-11 set the stage for some new terms:

execution context or context for short. This is the current structure being evaluated at each stage of the expression. When starting evaluation, the context is the original input JSON document.
scope. This is a JSON object that is constructed as evaluating the first argument to the let() function. JEP-11 even includes the notion of a stack of scopes that all participate when evaluating an identifier. If an identifier cannot be resolved from the current context, it is looked up in the scope and each scope’s parent scope until it is found. Otherwise, the evaluation returns null.

This post is using those terms.

Main themes

At this stage, I think the consensus is that there should not exist an implicit lookup when resolving identifiers in a scope. Instead, evaluation should be explicitely specified using some sort of reference mechanism.

Other main themes are listed here:

The scope in JEP-11 is an object. Some proposals argue that is could be any valid JSON token.
The scope objects in JEP-11 are organized in a stack, as nested expressions using the let() function are created. This allows identifier evaluation to lookup the stack of scope objects. There are some arguments to be made whether chaining should occur, or whether scopes should be isolated.
The scope stack in JEP-11 is available alongside the current execution context. A strict precedence is specified so that an identifier is first looked up in the current context, and then in the scope stack. There is an argument to be made whether the scope should replace the current execution context entirely. For instance, this proposal mandates that the current execution context is swapped with another context using a dedicated function.
Finally, the way to surface this feature in JMESPath expressions must be discussed. @jamesls proposes a new let <context> in <expression> construct, introducing keywords into the language, arguing that the semantics of the let() function is distinctly unique in JMESPath and should be replaced with a more integrated mechanism. Other alternatives using distinct tokens could be devised if we do not want to introduce keywords.

Let’s break down those main themes. Please feel free to include any that I may have missed in the comments.

Reference to identifiers

I think we all agree that reference to a scope identifier should be explicit rather than implicit as is currently the case with JEP-11.

So a new mechanism must be implemented. Here are some alternatives:

$<identifier>: using the $ sigil to reference identifiers from the scope.

This is a common alternative to JEP-11 implicit lookup and is included in James’ proposal. This implicitly, only works, however, if the scope – or each level in the scope stack – is an object. The scope itself is not accessible and cannot be acted upon¹.

Using a dedicated function to reference an identifier.

As far as I can tell, there is currently no proposal that promotes this behaviour. However, for the sake of completeness, this must be mentioned.

Using a function would require some level of indirection to access properties from a scope object. As functions do not accept identifier arguments, a raw-string must be used instead.

get_from_scope('foo')

However, using a function would pave the way to extended scenarios, such as accessing the scope object itself, such as using the scope as a – temporarily – input document for downstream expressions.

with_scope().bar

Scope object vs Scope value

The scope can be an object, or any valid JSON value.

This item is linked to the previous theme as it boils down to an alternative between accessing properties from an implicit scope but not the scope itself, vs accessing the whole scope value which may be any valid JSON value.

Should scopes be "chained"

In JEP-11, scope objects are organized as a stack that maps to how expressions using the let() function are nested. Crucially, looking up an identifier is specified as walking along the stack of scope objects.

As JMESPath expression can be nested, it seems natural to allow scope objects to be nested. This allows identifiers from nested expressions to shadow identifiers from upper levers, as happens in many programming languages and local variables.

However, some proposals sidestep this by mandating explicit usage of a scope using dedicated constructs inside which a regular JMESPath expression applies.

use_scope(&foo)

This leads to question about what does @ stand for? What about $ which is commonly used to refer to the "root" i.e the original input JSON document?

Does the scope replaces / shadows the context

In JEP-11, the scope is available in addition to the execution context. It does not completely replace it. It does not even shadow the context. Should the scope completely replace and become the context ?

This proposal for instance, specifies a way to swap out the current context for a scope which has been setup previously. [I renamed the function to make use of proper terms]

The following expression:

with_scope({foo: 'bar'}, &…)

Sets the scope much like the first argument to the let() function in JEP-11 does, except that it can be any valid JSON value rather than just an object.

The the expression-type in the second argument can take advantage of the following expression:

…, &use_scope(&foo)

At this point, the proposal mandates that inside the use_scope() function, JMESPath expressions operate on what has been setup as the scope as its input JSON document.

This proposal mandates that at any point in time, only a single execution context exists, although as an expression author, you can make it change for another at any point.

Exposing the feature to JMESPath using keywords vs tokens

James’s proposal is to introduce a new construct that I will refer to as the let-expression and goes like this:

let <scope> in <expression>

The scope is currently being proposed only as bindings to a variable but it explicitely uses the $ sigil to refer to this variable in downstream expressions.

Irrespective of the actual nature of the scope, this let-expression is extremely similar in shape to the let() function.

For the sake of the discussion, let’s imagine there exists a proposal very similar to JEP-11. Let’s call it JEP-11a. In fact, it is JEP-11 with an added twist that references to scoped variables are explicit using the $<identifier> varref expression proposed by James.

I would argue that Jame’s let-expression is virtually identical in terms of feature set as JEP-11a. In fact, this would be my favored design as this point. James has concerns over the very use of a function to introduce this design, however.

If we were to not use functions at all, I would personally favor pseudo-lambda syntax such as having the following grammar:

expression /= let-expression / reference

let-expression = multi-select-hash "=>" expression
reference = "$" identifier

Using this design is, again, virtually identical to using JEP-11a or James’ updated proposal in terms of feature set.

So while not making the actual syntax secondary, I think this theme is worth deciding upon last, until we have figured out exactly what we want to support with respect to the following four previous themes introduced previously.

Footnotes

¹ The $ sigil is (commonly) used to refer to the original input document.

springcomp · 2023-03-22T19:05:23Z

springcomp
Mar 22, 2023
Maintainer Author

This is a sample use case to demonstrate JMESPath requirements.

Let's start with a scenarios using a database and SQL statements.

SQL

https://learnsql.com/blog/count-join-sql/

employee_id	first_name	last_name	manager_id
4529	Nancy	Young	4125
4238	John	Simon	4329
4329	Martina	Candreva	4125
4009	Klaus	Koch	4329
4125	Mafalda	Ranieri	NULL
4500	Jakub	Hrabal	4529
4118	Moira	Areas	4952
4012	Jon	Nilssen	4952
4952	Sandra	Rajkovic	4529
4444	Seamus	Quinn	4329

Count all employees under each manager

    SELECT 
         sup.employee_id,
         sup.first_name,
         sup.last_name,
         COUNT (sub.employee_id) AS number_of_employees
    FROM employee sub
    JOIN employee sup
      ON sub.manager_id = sup.employee_id
GROUP BY sup.employee_id, sup.first_name, sup.last_name;

Result:

employee_id	first_name	last_name	number_of_employees
4125	Mafalda	Ranieri	2
4329	Martina	Candreva	3
4529	Nancy	Young	2
4952	Sandra	Rajkovic	2

JMESPath

I would like to reproduce the preceding scenarios using JMESPath expressions and evaluate where there could be missing features and where features could be improved.

Here is a given JSON document:

[
  {"employee_id": 4529, "first_name": "Nancy", "last_name": "Young", "manager_id": 4125},
  {"employee_id": 4238, "first_name": "John", "last_name": "Simon", "manager_id": 4329},
  {"employee_id": 4329, "first_name": "Martina", "last_name": "Candreva", "manager_id": 4125},
  {"employee_id": 4009, "first_name": "Klaus", "last_name": "Koch", "manager_id": 4329},
  {"employee_id": 4125, "first_name": "Mafalda", "last_name": "Ranieri", "manager_id": null},
  {"employee_id": 4500, "first_name": "Jakub", "last_name": "Hrabal", "manager_id": 4529},
  {"employee_id": 4118, "first_name": "Moira", "last_name": "Areas", "manager_id": 4952},
  {"employee_id": 4012, "first_name": "Jon", "last_name": "Nilssen", "manager_id": 4952},
  {"employee_id": 4952, "first_name": "Sandra", "last_name": "Rajkovic", "manager_id": 4529},
  {"employee_id": 4444, "first_name": "Seamus", "last_name": "Quinn", "manager_id": 4329}
]

Count all employees under each manager

Let's pretend we want the following output:

[
  {"employee_id": 4125, "first_name": "Mafalda", "last_name": "Ranieri", "number_of_employees": 2 },
  {"employee_id": 4329, "first_name": "Martina", "last_name": "Candreva", "number_of_employees": 3 },
  {"employee_id": 4529, "first_name": "Nancy", "last_name": "Young", "number_of_employees": 2 },
  {"employee_id": 4952, "first_name": "Sandra", "last_name": "Rajkovic", "number_of_employees": 2 },
]

The following expression using the JEP-11 let() function can be used:

[*].employee_id
   .let(
     {i: @}, &{
       employee_id: i,
       first_name: first_name,
       last_name: last_name,
       number_of_employees: $[?manager_id == i]|length(@)
     })
     |[?number_of_employees!=`0`]

To be fair to the original JMESPath specification, we are using the $ root-node reference here.
To be on equal footing, we need to introduce yet another layer with let(), thereby demonstrating that we, indeed, need scope objects to be organized as a stack.

let({dollar: @}, &
  [*].employee_id
     .let(
       {i: @}, &{
         employee_id: i,
         first_name: first_name,
         last_name: last_name,
         number_of_employees: dollar[?manager_id == i]|length(@)
       })
       |[?number_of_employees!=`0`]
)

Using my favored "JEP-11a" proposal, where looking up an identifier is made explicit using the $ sigil, this expression would be:

let({$root: @}, &
  [*].employee_id
     .let(
       {i: @}, &{
         employee_id: $i,
         first_name: first_name,
         last_name: last_name,
         number_of_employees: $root[?manager_id == $i]|length(@)
       })
       |[?number_of_employees!=`0`]
)

Which maps nicely with James’ proposal:

let $root = @ in.
  [*].employee_id
     .let $i = @ in. {
         employee_id: $i,
         first_name: first_name,
         last_name: last_name,
         number_of_employees: $root[?manager_id == i]|length(@)
       }
       |[?number_of_employees!=`0`]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What next for JEP-11 and beyond? #161

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What next for JEP-11 and beyond? #161

springcomp Mar 21, 2023 Maintainer

Lexical Scoping (revisited)

Glossary

Main themes

Reference to identifiers

Scope object vs Scope value

Should scopes be "chained"

Does the scope replaces / shadows the context

Exposing the feature to JMESPath using keywords vs tokens

Footnotes

Replies: 1 comment

springcomp Mar 22, 2023 Maintainer Author

SQL

Count all employees under each manager

JMESPath

Count all employees under each manager

springcomp
Mar 21, 2023
Maintainer

springcomp
Mar 22, 2023
Maintainer Author