Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XPath support for external secondary instances #245

Merged
merged 95 commits into from
Nov 19, 2024

Conversation

eyelidlessness
Copy link
Member

@eyelidlessness eyelidlessness commented Oct 31, 2024

Closes #203.

I'd normally include a more comprehensive writeup, especially for such a large change. But #203 effectively covers the design/approach, and I believe the extensive commit notes better represent this set of changes than anything I would write here. As we discussed this morning, we'll let those commit notes speak for the change.

This will simplify refactoring the implementation of axis evaluation itself. Anicipated: eliminating use of tree walkers
Again anticipating elimination of TreeWalker usage entirely
The `Length` (arity) type parameter is not really doing much of value. It was part of a notion of a much better function implementation interface that didn’t really pan out in the original pass on the `xpath` package. We can revisit this and that broader notion in some future work if it becomes pertinent. For now, this simplifies some downstream work to genericize DOM access
…onstants

Several incoming commits are going to expect this consistency of at least some of these exports. No time like the present to take care of them all.
…ing tests

Addressed as “nuissance failing tests” because:

1. The passing behavior would (currently) satisfy a thrown error in an async context (i.e. a Promise rejection), but
2. The failing behavior produces a successful value (engine implementation of `RootNode`)
3. Vitest then hangs indefinitely, attempting to produce a diff from comparing something it has no reason to compare!

These tests **will change** when we address incoming error production changes (Result type). But we do currently check for throw (Promise rejection), and that will still be the case through some other incoming work ahead of those error production changes. And ultimately, the tests fail anyway, and will continue to do even after incorporating the work that prompts this change.
Copy link

changeset-bot bot commented Oct 31, 2024

🦋 Changeset detected

Latest commit: e636a9c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages
Name Type
@getodk/xforms-engine Minor
@getodk/xpath Minor
@getodk/scenario Patch
@getodk/common Patch
@getodk/ui-solid Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@eyelidlessness eyelidlessness marked this pull request as ready for review November 4, 2024 23:58
@brontolosone brontolosone self-requested a review November 11, 2024 14:12
Copy link
Member

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've decided to merge this now even though it has only received cursory review so far. It's a conceptual change that @eyelidlessness has been planning for a long time and there's enough testing coverage that it's unlikely to have broken anything.

If we don't merge now, we'll instead stack commits on top. Even if there are changes to be made, we think these commits will be valuable to have access to in the future.

Next steps:

  • @eyelidlessness will remove no-op review cutpoint commits and merge
  • @eyelidlessness will add support for XML, CSV and GeoJSON external secondary instances
  • @brontolosone will interact with the implementation as he's working on the host application integration and will file issues or PRs as appropriate

This begins preparation for refactoring the `@getodk/xpath` package to support evaluation against opaque/arbitrary DOM implementations.

As implied by the introduction of the `packages/xpath/src/adapter` directory, support for this will be implemented by an adapter API. Subsequent commits in the refactor will introduce interfaces for that adapter API, transitioning existing DOM API usage to that API shape.

This commit lays groundwork for those incoming changes by isolating existing DOM API access…

1. At a type configuration level: eliminating use of the DOM and DOM.Iterable TypeScript type libs
2. At a project type level: introducing the `DOM.compat-min.d.ts` and related type declarations
3. Further reducing runtime usage of DOM APIs: eliminating or reworking remaining API access which still exceed those compatibility types determined to be the minimum necessary for the adapter implementation
4. Related package structure consolidation: reworking some earlier awkward attempts to prepare for such compatibility efforts, reducing type level surface area of key internals
Begins introducing the concepts which will become an adapter-based approach to DOM access:

- XPathNode
- Adapter*
- XPathDOMProvider
- Default DOM provider and associated types
…ce options

**IMPORTANT**

This is a conceptually whole change, intended to be taken as a single commit for historical reasoning. But it is functionally incomplete. All of the following will be temporarily broken as of this commit:

- `@getodk/xpath`:
  - tests
  - type checks
- anything else downstream, beginning with `@getodk/xforms-engine`

This will be resolved immediately in a subsequent commit.

**WHY** (in user story form)

As an ODK Web Forms user, I want my forms to be able to reference data from external sources (“external secondary instances”).

**WHY** (in terms of design and implementation)

In order to support this user story, we are:

1. Developing support for evaluation of data from those external sources.
2. Correcting flaws in the existing `@getodk/xpath` mental and data model, ODK XForms data structures outside of primary instance state are to be treated as distinct-but-related subtrees, rather than as adjacent subtrees within the same XML DOM.

**ADDITIONAL DETAIL: DOM adapter abstraction precursor**

This also introduces several method implementations to the (still temporary) DOM provider abstraction. These methods are a rough preview of the overall DOM adapter/provider APIs to come. They are already in active use by the commit’s other changes.

**ADDITIONAL DETAIL: related internals refactoring**

Since this is necessarily a breaking change, and since the change is central to the underlying point of these refactors, it made sense to take on some good stewardship along the way. This tightens up the internal design for accessing and structuring both itext translations as well as secondary instances. It is expected that, apart from a few changes to module boundaries, these internals will be relatively stable for the remainder of the DOM adapter refactor.
**IMPORANT!** This commit fixes all _functional breakage_ introduced in the prior commit.

**DETAILS: DefaultXFormsXPathEvaluator**

This change **TEMPORARILY** replaces the `@getodk/xpath` entrypoint export of `XFormsXPathEvaluator` with a new `DefaultXFormsXPathEvaluator`. This is a _subclass of `XFormsXPathEvaluator`_ which performs all of the construction logic which:

- was handled by that class prior to the previous commit
- is **currently** expected to be handled downstream (`@getodk/xforms-engine` and beyond)
- is **currently and indefinitely** expected to be handled as part of test setup across a large portion of the `@getodk/xpath` test suite

**DETAILS: TestXFormsXpathEvaluator**

This is **TEMPORARILY** a thin shim over `DefaultXFormsXPathEvaluator`. Once downstream (`@getodk/xforms-engine` and beyond) have fully migrated to the breaking changes introduced in the previous commit, as well as those incoming breaking changes to introduce DOM adapter APIs, the logic of the “default” evaluator can be moved here and isolated to testing only.

**DETAILS: entrypoints**

The main `src/index.ts` entrypoint **TEMPORARILY** re-exports `DefaultXFormsXPathEvaluator` **as `XFormsXPathEvaluator`**. In turn, this **TEMPORARILY** reverses the package-level breaking changes introduced in the previous commit.

The intent of sequencing the commits this way is to have:

1. A single commit introducing just the substantive breaking prerequisite changes of that previous commit.
2. Continuity through the remaining refactor, until…
3. A single commit finalizing all of the package-level breaking changes expected for the full refactor.

Further, the intent of that continuity is so that we can choose to break up this LARGE set of changes into smaller chunks if it makes more sense from a review perspective.
The XPathNode module’s types are the core of the XPathDOMAdapter design:

- `XPathNodeKind` is an enumeration of the XPath data model’s semantic **kinds of nodes** (as specified by XPath 1.0, and applied consistently throughout the ODK XForms specification)
- For each enumerated node kind, a corresponding interface is defined. Each interface is **mostly opaque**, excepting a single key: the symbol `XPathNodeKindKey`, which is assigned the corresponding `XPathNodeKind` member.

As detailed by the `XPathNode` JSDoc, this key is never actually accessed directly by `@getodk/xpath`. It is a type-level fiction used to designate these opaque interfaces for each of the seven XPath semantic node kinds.
…point

As described in its JSDoc, defining this explicit type for nodes which **exceed** the XPath data model’s seven node kinds is intended to support adapters which provide certain, limited access to a Document Type Declaration. (This functionality will be implemented by the standard web/browser/WHAT Working Group DOM adapter, to preserve existing compatibility with the same standard’s XPathEvaluator.)
The intent is that this module will ultimately serve as the public DOM adapter type/API interface for downstream consumers of `@getodk/xpath`
…null

This is kept intentionally opaque at the engine/client boundary, i.e. the present `@getodk/xforms-engine` package entrypoint. Clients **do** need to know that this value will not necessarily be null, but we don’t expose any other detail about what `RootNode.parent` actually is.

In subsequent commits, the value will be an internal representation of the primary instance. This representation will have _XPath document semantics_, which is currently a wholly engine-internal concern.

In the future, we may decide there is some value to clients in exposing the primary instance document node representation. If so, it will most likely be of value for introspection (debugging unexpected form behavior, “dev tools”-like use cases). For current client use cases, it would only be an inconvenient indirection.
…ate init

This is a precursor to moving the same logic up to PrimaryInstance. It’s a distinct commmit to make it easier to follow how the logic is mostly just being moved (and a bit simpler in the process).

The original TODO comment has been moved and updated to reflect the silliness of the particular xpath/engine separation of responsibilities. We definitely should address it! But that feels out of scope, and this refactor is already quite large enough.
…ionContext

This, too, is a precursor to moving translation state up to PrimaryInstance. Here is the reasoning for doing this refactor now:

- Every expression is potentially translatable; so…
- The context of evaluating it must provide the ability to subscribe to translation state; and…
- Since we’re moving that state up in the hierarchy of the engine’s primary instance representation, now is as good a time as any to recognize that it doesn’t matter where the state is _instantiated_.

For an internal interface, an “evaluation context” should be complete without being coupled to unrelated implementation details like document structure.
This, TOO, is a precursor to moving translation state up to PrimaryInstance.

While this change isn’t strictly necessary for that purpose, the class **will** need access to a reactive scope. I’m just taking this opportunity to address something that should have already happened but hasn’t really fit in any other change.

To clarify the intent of this change, Solid’s reactivity implementation depends on a few nuances which aren’t always immediately obvious. In this case, Solid’s own reactive scope (in Solid’s terms: “owner” and “root”) are only effective either:

- synchronously
- by a subsequent call to `runWithOwner` (over which our `ReactiveScope.runTask` is an abstraction, for … reasons …)

Since PrimaryInstance construction (and Root before it) follows an `await` expression, we have never properly identified the reactive scope (“owner”) of a client. This may have been a source of many weird edge cases with Solid (engine) <-> Solid (client) reactivity. Might as well address it now that PrimaryInstance is also going to need a `scope`!
Making a change to the hierarchic structure has been much fussier than it should be. The `DescendantNodeParent` conditional type is too complex anyway, and it was exacerbating this issue.

While I’m not thrilled about expanding an already long series of generic type params on these abstract base classes, it’s more explicit and easier to reason about (at least with editor support).
…ocument

`PrimaryInstanceDocument` is intended roughly as a conceptual equivalent to the client-facing node APIs.

A subsequent commit will update `Root` to extend `DescendantNode`
This largely restores a consistent data model hierarchy throughout the primary instance state tree, and the engine’s implementation of the client API (despite current lack of exposure of PrimaryInstanceDocument)
These types expand slightly on the `@getodk/xpath` XPathNode opaque DOM types, specifying the explicit minimum interface to be satisfied by the engine’s XPath DOM representation.

Following commits will implement these interfaces for existing InstanceNode implementations, as the initial basis for an engine XPath DOM adapter.
Some of the interface’s properties have already been attached (and in some cases they are regrouped to make that clear). The remaining changes include:

- Explicit designation of the node’s XPath semantic kind, specifying which XFormsXPathNode interface is implemented
- Assignment of `XPathNodeKindKey` consistent with that XPath semantic node kind
- A generic base implementation of `XFormsXpathNode.getXPathChildNodes` suitable for all node types. This includes some logic specific to repeats, which would ideally not be defined on the base class. But the logic is specific to **parents of repeat ranges** (i.e. the union type `GeneralParentNode`), and would be repeated verbatim across all of those parent node types for little benefit. It is expected that this specific code branch will be stable long term, so it seems reasonable to consolidate here now.
- A generic base implementation of `getXPathValue`, suitable for all **parent nodes**
- Leaf/value node overrides of `getXPathValue` suitable for their runtime value representation. Nodes whose runtime value is already represented by a JavaScript string have their values passed through directly. Nodes with any other runtime value represntation reuse their existing ValueContext.encodeValue internal API.

Also picks a few code style nits along the way.
…valuations

The nature of this change is… well, it’s a little bit frustrating! There are probably some other foundational changes we could make that might make some or even all of it unnecessary.

When this refactor is complete, XPath expressions will be evaluated against the reactive state of InstanceNode and its concrete subclasses. There are a handful of chicken/egg conditions that we must account for in construction of those nodes. The intent of this change is to effectively block evaluation against an incomplete or inconsistent PrimaryInstance DOM tree state (or against any subtree portion thereof), and then to reactively recompute any affected expressions once any blocking conditions are resolved.

This involves two coupled concepts:

1. Consistency of the PrimaryInstance DOM tree is determined by an “attached” state, i.e. when a given parent node and a given child node have a consistent view of their relationship (where the parent reference is presently non-reactive, and child references are always reactive).

2. Computations, when blocked, produce a default value (appropriate for the expression’s XPath result type, as provided by an option to `createComputedExpression` with a default fallback for each result type). A node’s attached state being reactive as well, establishing an attached state will cause such blocked computations to be rerun, returning the expression’s result rather than the computation’s default.

- - -

From the perspective of form **initialization**, in a perfect world we would simply defer evaluation of any expressions until their requisite state is in place. There are some implementation details which make this challenging now, without hugely expanding the already huge scope of this refactor.

There is also the matter of post-init state. One nice side effect of this change _as a concession_ is we get another feature FOR FREE: the same pre-init behavior will be applied for any removed node (i.e. repeat instances), as well as any descendants of a removed node’s subtree. The effect is that removing repeat instances automatically blocks all computations for that repeat instance, its children, and so on. So in that sense, while this change is partly working around non-ideal aspects of implementation _for this refactor_, it is also implementing desirable functionality which we will want to preserve!

- - -

**Notes on scenario changes**

Some, not all, of the scenario changes here are temporary.

1. Newly passing cycle detection tests in form-definition-validity.test.ts will continue to pass when this refactor is complete.
2. Newly failing select tests will be addressed with a select behavior change which will become necessary once the (pending) engine DOM adapter implementation is used for evaluation
3. The newly failing relevant test will pass when this refactor is complete. Honestly, I’m kind of baffled by the change, but I’m almost 100% certain that a potential fix for it would involve code that is going to be deleted in a coming commit.
3. I am not sure what the current state of the child vaccination test suite is. I do know that the current known failure case is no longer blocking. I also know that the final state of this refactor will result in the full smoke test complete (as in remaining details filled in from the previous incomplete port stopping point). And it will pass. I tried running these tests with the remaining JR port updates, but I got impatient after a couple minutes waiting to see if they complete or time out. (This bodes well for the kind of performance improvements coming right up!)
This introduces a minimal, generalized DOM abstraction suitable for **in-memory** representation of static subtrees. It will be used to represent both itext translations and secondary instances.

For this refactor, internal secondary instances will be migrated to a StaticNode representation. But the design is intentionally agnostic to parse logic, in anticipation of support for external secondary instances of arbitrary data types. (This has been tested locally to validate the design.)
…sentation

This parsing functionality is intentionally decoupled from the StaticNode representation itself. The intent is that other data types will be parsed in a similar manner, likely in sibling modules.

The functionality is also intentionally generic over the specific StaticDocument and (root element) StaticElement constructors used for parsing. The purpose of this being generic is so that we can have more specific static types associated with itext translation subtrees, and secondary instance subtrees. This will also satisfy certain type level constraints at the `@getodk/xpath` XFormsXPathEvaluator boundary.

The next commit will introduce parsing of those subtrees. And with it, begin another temporarily broken project state.
…y instances

We must enter a brief broken state at this commit, because EngineXPathEvaluator is still using the WHAT DOM adapter.

This commit also begins to introduce the types which will form the basis for the coming engine adapter!
Note: this is a somewhat embarrassing oversight, but in my enormous mess of stashes, I haven’t had great luck finding the solution I had for `compareDocumentOrder`. For now it’s stubbed.

This is **technically** a regression, but I’m not aware of any previously passing tests which catch it. Of the top of my head, it’ll affect:

- preceding and preceding-sibling axes (which I believe are beyond JR scope)
- union expressions (which I’m kind of surprised not to see in any newly failing tests!)
- the `id` function (also beyond JR scope, I believe)
Note that `getProcessingInstructionName` is not implemented. We do not presently have an engine representation of processing instructions _at all_, and I highly doubt we will anytime soon.

The next commit will integrate this adapter, restoring project functionality, and bringing us very near the end of this refactor!
(Note

This commit involves two sets of changes which are inherently tightly coupled:

1. Value state is no longer persisted to a WHAT Working Group DOM backing store
2. Concrete implementations of InstanceNode now act as their own node representations for the purposes of evaluating XPath evaluations, XForms spec computations

**Value state:**

This constitutes the most substantial change in the commit, but it is pretty much a drastic simplification of value state logic. This pretty much comes “for free” by eliminating the need for a WHAT DOM backing store.

I took the opportunity to break up some other logic which with more incidental coupling than I’d consider ideal, and updated JSDoc to reflect both the changes in storage mechanism as well as general clarification where it made sense.

**InstanceNode as EvaluationContext.contextNode:**

For most concrete InstanceNode subclasses, there is very little change beyond updating `contextNode` to reference `this` (with some type refinements where makes sense or necessary to satisfy the internal interfaces).

The more substantial changes are all concerned with eliminating WHAT DOM usage where any such logic still remains _in PrimaryInstance_.

To be clear: as of this commit, the engine no longer uses the WHAT/browser DOM for any aspect of instance state. Remaining WHAT DOM usage in the engine is now restricted to:

- parsing XForm XML into the engine’s internal `*Definition` data model, which is then used to initialize `PrimaryInstance`
- tests

- - -

We are VERY near the end of this refactor! What’s left:

- An outstanding regression in select behavior. Addressing this will involve a judgment call around this open question: #57. The `@getodk/scenario` select.test.ts suite will have updates to reflect these changes.
- Eliminate a few odd repeat-specific workarounds which are now redundant or moot.
- Eliminate most current usage of parse-stage dependency analysis. The underlying dependency analysis **logic** will remain in the codebase, as it will still have value for other work in the future.
- Removal of subscription interfaces and logic which depended on current usage of dependency analysis. As of this commit, both the affected dependency analysis and subscription logic are obsolete. Their responsibilities are now handled **by evaluation of XPath expressions** with the now-integrated engineDOMAdapter!
- Complete porting child-vaccination.test.ts, which will pass once complete!
- Odd assorted cleanup and loose ends, if any remain
…mset change

This addresses a temporary regression in an earlier commit laying groundwork for engineDOMAdapter integration. The nature of the change is best described by #57

Some previously failing scenario tests now pass. And some new tests are added to clarify the change in behavior.

**NOTE:** this change is not intended to press for a decision on #57. It is primarily meant to ensure that, to the best of my knowledge, this refactor does not have any known regressions (excepting some edge cases discussed in previous commit notes). Secondarily, it’s meant to leave some better breadcrumbs for us to make an informed decision on that issue when we do prioritize it.
…contextNode

These workarounds were introduced to address edge cases around evaluating XPath expressions in the context of a repeat range as a distinct thing/concept. While there are still very real complexities in the impedance mismatch between the XPath DOM data model and the engine-internal/client-facing node hierarchy, integration of engineDOMAdapter eliminates the need for these particular workarounds.
…ation

This interface was introduced to address a disconnect between:

- XPath evaluation against a WHAT Working Group DOM data store
- Engine computation updates implemented with by reactive subscriptions

In a nutshell, this was the previous design:

1. When parsing an XForm XML definition, for each of the form’s computation expressions (as DependentExpression), identify XPath sub-expressions with LocationPath references.
2. Where possible, associate those with form model nodeset references or other contextual references, where appropriate (as DependencyContext).
3. When initializing the state of a form instance, establish reactive subscriptions to each computation’s dependencies by walking the instance state tree to find InstanceNodes with that reference (as in the form definition, ignoring dynamic details like repeat instance position predicates).
4. For each node dependency found, establish a subscription to the node by its implementation of a `SubscribableDependency.subscribe` method. The implementation details of this method varies by node type and semantics, but it was generally intended to have the semantics of “when the state of this node is updated in a way which should cause recomputation of expressions referencing it”.

**ALL OF THAT IS NOW OBSOLETE!**

These responsibilities are now handled entirely by **evaluating the XPath expression**. This works because integration of `EngineXPathEvaluator` with `engineDOMAdapter` now reads directly from the reactive state of those nodes (their values, their hierarchical references, their attached state, their relevance, all of it). This represents both a significant reduction in **synchronization complexity**. It also has a substantial positive performance impact.

What’s more… this is only the first half of the complexity reduction. This eliminates steps 3-4 as described above. The next commit will eliminate (the vast majority of) steps 1-2.
As described in the previous commit, for the most part this usage of dependency analysis during form parsing is completely redundant to logic performed in the course of evaluating XPath expressions with `engineDOMAdapter`.

In theory, this commit could also delete a bunch of the dependency analysis code along with it. But that code will have value for other purposes. Top of mind: a proper implemenation of cycle detection. I anticipate other use cases as well.

There is still some remaining dependency analysis in active use. This largely focuses on:

- Setting up reactive subscriptions to translation state, updating expressions with `jr:itext` calls when the form’s active language state is changed
- Identifying “notes”, as we’ve modeled them to be value nodes with a `readonly` bind expression which will always evaluate to true

There is certainly still some further simplification left on the table at this commit point. It’s not clear how much the `DependencyContext`/`DependentExpression` abstraction still makes sense. We certainly still use `DependentExpression` to set up reactive computations, but there are other aspects of the conceptual pair which are likely redundant now as well. I’ve left these unaddressed to try to wind up this refactor as quickly as possible.
The remaining behavior under test was at least functionally unblocked somewhere midway in the `engineDOMAdapter` integration process. It isn’t exactly clear which commit that occurred in, because it was prohibitively slow to validate.

I’d like to pay some extra scrutiny to this commit in review. I believe I’ve ported the test logic as faithfully as I could reasonably do, but that logic has quite a lot of indirection and far more abstraction than I’d usually see in tests. So it’s quite possible I’ve made mistakes!

- - -

Fin!
@eyelidlessness eyelidlessness force-pushed the refactor/xpath/opaque-dom-adapter branch from 28658a7 to bb821f3 Compare November 19, 2024 22:33
@eyelidlessness eyelidlessness merged commit 5a26434 into main Nov 19, 2024
86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

External secondary instances: Engine representation, XPath support
2 participants