What next for JEP-11 and beyond? #161
Replies: 1 comment
-
This is a sample use case to demonstrate JMESPath requirements. Let's start with a scenarios using a database and SQL statements. SQLhttps://learnsql.com/blog/count-join-sql/
Count all employees under each manager SELECT
sup.employee_id,
sup.first_name,
sup.last_name,
COUNT (sub.employee_id) AS number_of_employees
FROM employee sub
JOIN employee sup
ON sub.manager_id = sup.employee_id
GROUP BY sup.employee_id, sup.first_name, sup.last_name; Result:
JMESPathI would like to reproduce the preceding scenarios using JMESPath expressions and evaluate where there could be missing features and where features could be improved. Here is a given JSON document: [
{"employee_id": 4529, "first_name": "Nancy", "last_name": "Young", "manager_id": 4125},
{"employee_id": 4238, "first_name": "John", "last_name": "Simon", "manager_id": 4329},
{"employee_id": 4329, "first_name": "Martina", "last_name": "Candreva", "manager_id": 4125},
{"employee_id": 4009, "first_name": "Klaus", "last_name": "Koch", "manager_id": 4329},
{"employee_id": 4125, "first_name": "Mafalda", "last_name": "Ranieri", "manager_id": null},
{"employee_id": 4500, "first_name": "Jakub", "last_name": "Hrabal", "manager_id": 4529},
{"employee_id": 4118, "first_name": "Moira", "last_name": "Areas", "manager_id": 4952},
{"employee_id": 4012, "first_name": "Jon", "last_name": "Nilssen", "manager_id": 4952},
{"employee_id": 4952, "first_name": "Sandra", "last_name": "Rajkovic", "manager_id": 4529},
{"employee_id": 4444, "first_name": "Seamus", "last_name": "Quinn", "manager_id": 4329}
] Count all employees under each managerLet's pretend we want the following output: [
{"employee_id": 4125, "first_name": "Mafalda", "last_name": "Ranieri", "number_of_employees": 2 },
{"employee_id": 4329, "first_name": "Martina", "last_name": "Candreva", "number_of_employees": 3 },
{"employee_id": 4529, "first_name": "Nancy", "last_name": "Young", "number_of_employees": 2 },
{"employee_id": 4952, "first_name": "Sandra", "last_name": "Rajkovic", "number_of_employees": 2 },
] The following expression using the JEP-11
To be fair to the original JMESPath specification, we are using the
Using my favored "JEP-11a" proposal, where looking up an identifier is made explicit using the
Which maps nicely with James’ proposal:
|
Beta Was this translation helpful? Give feedback.
-
Lexical Scoping (revisited)
After a few months with several·implementations·currently·running JEP-11 some concerns are being raised.
The main concern is around the notion of scopes that act as a fallback when evaluating identifiers to
null
.In particular, the following expression is judged problematic:
search( let({qux: 'qux'}, &foo.qux ), {"foo": {"bar": "baz"}} )
->"qux"
The intuitive behaviour would be for
foo.qux
to returnnull
. However, as a fallback, thequx
identifier
is not found in the execution context and looked up in the scope set as the first argument to thelet()
function.This concern was actually raised early on in the design of JEP-11 by James and that may be one of the main reasons why JEP-11 was never officially accepted.
This concern is what prompted a new discussion and a potential new proposal to replace and improve on JEP-11.
This post is an attempt to summarize the concerns that we have with JEP-11 and try to list various alternatives with their pros and cons.
Glossary
First, let’s agree on common terms so that everyone discussing alternatives can be on the same page.
JEP-11 set the stage for some new terms:
execution context
orcontext
for short. This is the current structure being evaluated at each stage of the expression. When starting evaluation, the context is the original input JSON document.scope
. This is a JSON object that is constructed as evaluating the first argument to thelet()
function. JEP-11 even includes the notion of a stack of scopes that all participate when evaluating an identifier. If an identifier cannot be resolved from the currentcontext
, it is looked up in thescope
and eachscope
’s parentscope
until it is found. Otherwise, the evaluation returnsnull
.This post is using those terms.
Main themes
At this stage, I think the consensus is that there should not exist an implicit lookup when resolving identifiers in a scope. Instead, evaluation should be explicitely specified using some sort of reference mechanism.
Other main themes are listed here:
The
scope
in JEP-11 is an object. Some proposals argue that is could be any valid JSON token.The
scope
objects in JEP-11 are organized in a stack, as nested expressions using thelet()
function are created. This allowsidentifier
evaluation to lookup the stack ofscope
objects. There are some arguments to be made whether chaining should occur, or whetherscopes
should be isolated.The
scope
stack in JEP-11 is available alongside the currentexecution context
. A strict precedence is specified so that anidentifier
is first looked up in the currentcontext
, and then in thescope
stack. There is an argument to be made whether thescope
should replace the currentexecution context
entirely. For instance, this proposal mandates that the currentexecution context
is swapped with anothercontext
using a dedicated function.Finally, the way to surface this feature in JMESPath expressions must be discussed. @jamesls proposes a new
let <context> in <expression>
construct, introducing keywords into the language, arguing that the semantics of thelet()
function is distinctly unique in JMESPath and should be replaced with a more integrated mechanism. Other alternatives using distinct tokens could be devised if we do not want to introduce keywords.Let’s break down those main themes. Please feel free to include any that I may have missed in the comments.
Reference to identifiers
So a new mechanism must be implemented. Here are some alternatives:
$<identifier>
: using the$
sigil to reference identifiers from thescope
.This is a common alternative to JEP-11 implicit lookup and is included in James’ proposal. This implicitly, only works, however, if the
scope
– or each level in thescope
stack – is an object. Thescope
itself is not accessible and cannot be acted upon¹.As far as I can tell, there is currently no proposal that promotes this behaviour. However, for the sake of completeness, this must be mentioned.
Using a function would require some level of indirection to access properties from a
scope
object. As functions do not acceptidentifier
arguments, araw-string
must be used instead.get_from_scope('foo')
However, using a function would pave the way to extended scenarios, such as accessing the
scope
object itself, such as using thescope
as a – temporarily – input document for downstream expressions.with_scope().bar
Scope object vs Scope value
This item is linked to the previous theme as it boils down to an alternative between accessing properties from an implicit
scope
but not thescope
itself, vs accessing the wholescope
value which may be any valid JSON value.Should scopes be "chained"
As JMESPath expression can be nested, it seems natural to allow
scope
objects to be nested. This allows identifiers from nested expressions to shadow identifiers from upper levers, as happens in many programming languages and local variables.However, some proposals sidestep this by mandating explicit usage of a
scope
using dedicated constructs inside which a regular JMESPath expression applies.use_scope(&foo)
This leads to question about what does
@
stand for? What about$
which is commonly used to refer to the "root" i.e the original input JSON document?Does the scope replaces / shadows the context
This proposal for instance, specifies a way to swap out the current
context
for ascope
which has been setup previously. [I renamed the function to make use of proper terms]The following expression:
with_scope({foo: 'bar'}, &…)
Sets the
scope
much like the first argument to thelet()
function in JEP-11 does, except that it can be any valid JSON value rather than just an object.The the
expression-type
in the second argument can take advantage of the following expression:…, &use_scope(&foo)
At this point, the proposal mandates that inside the
use_scope()
function, JMESPath expressions operate on what has been setup as thescope
as its input JSON document.This proposal mandates that at any point in time, only a single
execution context
exists, although as an expression author, you can make it change for another at any point.Exposing the feature to JMESPath using keywords vs tokens
James’s proposal is to introduce a new construct that I will refer to as the
let-expression
and goes like this:let <scope> in <expression>
The
scope
is currently being proposed only as bindings to a variable but it explicitely uses the$
sigil to refer to this variable in downstream expressions.Irrespective of the actual nature of the
scope
, thislet-expression
is extremely similar in shape to thelet()
function.For the sake of the discussion, let’s imagine there exists a proposal very similar to JEP-11. Let’s call it JEP-11a. In fact, it is JEP-11 with an added twist that references to scoped variables are explicit using the
$<identifier>
varref
expression proposed by James.I would argue that Jame’s
let-expression
is virtually identical in terms of feature set as JEP-11a. In fact, this would be my favored design as this point. James has concerns over the very use of a function to introduce this design, however.If we were to not use functions at all, I would personally favor pseudo-lambda syntax such as having the following grammar:
Using this design is, again, virtually identical to using JEP-11a or James’ updated proposal in terms of feature set.
So while not making the actual syntax secondary, I think this theme is worth deciding upon last, until we have figured out exactly what we want to support with respect to the following four previous themes introduced previously.
Footnotes
¹ The
$
sigil is (commonly) used to refer to the original input document.Beta Was this translation helpful? Give feedback.
All reactions