-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable glue syntax in label
to access current column/segment (and possibly others)
#495
Conversation
Oops I spoke too soon - there's one sizeable challenge for the yaml round trip - the original glue expression is lost after the This gives us a behavior where only the first materialized label is picked to represent the label for the entire step: Lines 1037 to 1038 in dc1b917
agent2 <- create_agent(~ small_table) |>
col_vals_lt(
c, 8,
segments = vars(f),
label = "The `{.step}()` step for group '{.segment}'"
) |>
interrogate()
yaml_agent_string(agent2, expanded = FALSE)
#> type: agent
#> tbl: ~small_table
#> tbl_name: ~small_table
#> label: '[2023-10-25|08:33:26]'
#> lang: en
#> locale: en
#> steps:
#> - col_vals_lt:
#> columns: vars(c)
#> value: 8.0
#> segments: list(vars(f))
#> label: The `col_vals_lt()` step for group 'high' There's not a great way around this, since we'd prefer the label to be materialized (especially given that currently this PR allows more complicated stuff like Essentially, it'd be nice for #> label:
#> - The `col_vals_lt()` step for group 'high'
#> - The `col_vals_lt()` step for group 'low'
#> - The `col_vals_lt()` step for group 'mid' And only collapse if label is identical across the expanded steps |
I toyed around with this idea a bit further. In addition to supporting glue syntax in agent_pre <- create_agent(~ small_table) |>
col_vals_lt(
c, 8,
segments = vars(f),
label = "The `col_vals_lt()` step for group '{.segment}'"
)
# Yaml representation (multi-length label)
yaml_agent_string(agent_pre, expanded = FALSE)
#> type: agent
#> tbl: ~small_table
#> tbl_name: ~small_table
#> label: '[2023-10-26|10:39:24]'
#> lang: en
#> locale: en
#> steps:
#> - col_vals_lt:
#> columns: vars(c)
#> value: 8.0
#> segments: list(vars(f))
#> label:
#> - The `col_vals_lt()` step for group 'high'
#> - The `col_vals_lt()` step for group 'low'
#> - The `col_vals_lt()` step for group 'mid'
agent_yaml <- tempfile()
yaml_write(agent_pre, expanded = FALSE, filename = agent_yaml)
# Multi-length label makes the round-trip
agent_post <- yaml_read_agent(agent_yaml)
yaml_agent_string(agent_post, expanded = FALSE)
#> type: agent
#> tbl: ~small_table
#> tbl_name: ~small_table
#> label: '[2023-10-26|10:39:24]'
#> lang: en
#> locale: en
#> steps:
#> - col_vals_lt:
#> columns: vars(c)
#> value: 8.0
#> segments: list(vars(f))
#> label:
#> - The `col_vals_lt()` step for group 'high'
#> - The `col_vals_lt()` step for group 'low'
#> - The `col_vals_lt()` step for group 'mid'
identical(
as_agent_yaml_list(agent_pre, expanded = FALSE),
as_agent_yaml_list(agent_post, expanded = FALSE)
)
#> [1] TRUE The old yaml-writing behavior is preserved by collapsing agent_one_label <- create_agent(~ small_table) |>
col_vals_lt(
c, 8,
segments = vars(f),
label = "I'm the same label for all segments!"
)
yaml_agent_string(agent_one_label)
#> type: agent
#> tbl: ~small_table
#> tbl_name: ~small_table
#> label: '[2023-10-26|10:48:33]'
#> lang: en
#> locale: en
#> steps:
#> - col_vals_lt:
#> columns: vars(c)
#> value: 8.0
#> segments: list(vars(f))
#> label: I'm the same label for all segments! Implementational detailsI introduce a new label <- resolve_label(label, columns, segments_list)
for (i in seq_along(columns)) {
for (j in seq_along(segments_list)) { All create_validation_step(
<...>
label = label[[i,j]],
<...>
) This handles Separately, writing out n>1 character vector to yaml is handled inside label:
- first
- second
- third as opposed to: label:
- - first
- second
- third TODOProof of concept is complete and the PR passes existing tests, but the new features (dynamic glue syntax & accepting multi-length Let me know if you think this is a direction worth pursuing! |
This is great! And we could definitely merge this before the other big PR you worked on. So long as the YAML round trips, this is good stuff! |
Sounds good - I'll add a couple more tests here and ping you again for a final review! |
I've covered some expected behaviors for the new features in Currently, the following string variables from inside data = list(
.step = assertion_type,
.col = column,
.seg_col = seg_col,
.seg_val = seg_val
), ... where
Let me know how this looks! I'll then edit NEWS accordingly |
Looks really good! We can follow up later with some documentation (in each of the validation functions) that describes the new glue-powered naming (and which variables you can use). This is a super-great addition! |
I'll re-run the tests once CRAN is back up (that's causing the failure with the |
Everything looks good wrt tests. Would you add that NEWS entry? Then we’re good to merge! |
Done! While I'm at it, would you like me to add that blip to the function docs as well? I think there's just one place where Lines 134 to 140 in dc1b917
I can add another short paragraph below this explaining the new glue syntax. And we could also subtly document multi-length label support here by changing the argument signature from (Oops! I'm just seeing your earlier comment that we could do this later - I'm happy to merge this as-is and work on documentation separately!) |
I was thinking about deferring only because it would be a whole lot more work for you (and you already did quite a bit). But, the changes you proposed (both great!) would be great to have in here. It's a lot of copy/paste + |
Ah! Ok I'm just now realizing the complexity of the documentation setup beyond just inheriting params 😅. Yes, let's merge this now first! I'll then merge this into the tidyselect PR. And maybe we can hold off documentation updates after both are merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm!
@yjunechoe its now in! I will get to the other PR in short order. |
Summary
Many validation functions internally iterate over a user-specified set of columns and segments, where each combination is materialized as individual steps.
pointblank/R/col_vals_lt.R
Lines 377 to 378 in dc1b917
Currently, these steps simply inherit the same value for
label
, but given that the columns/segments can be selected dynamically (especially with #493), it'd be nice if users can access the context for the current step.The PR implements this by exposing
{.col}
(usingdplyr::across()
terminology) and{.segment}
from the internalcreate_validation_step()
function:This is next point is minor, but I personally think that exposing more information from the internals is better (as long as it's a bare string, which is risk-free). So just for fun this PR also exposes
{.step}
as the name of the validation step (I don't actually have a strong feeling about this one, so{.step}
can be removed for simplicity).This allows the following reprex for the suggestion raised in #451:
The difficult part of this implementation was getting the scoping right: the
{}
context should only search for vars/fns in the caller environment of the validation function (ex: the environment wherecol_vals_lt()
was called by the user) and above. Otherwise, glue will "leak" internal variables. In this PR, I use the heuristic ofcaller_env(n=2L)
, given the following invariant in how the internalcreate_validation_step()
function gets called:So
caller_env(n=2L)
climbs twice fromcreate_validation_step()
where it's called, ensuring that users can't access any arbitrary local variables via the glue context. For extensibility,create_validation_step()
gains a.call = caller_env(n=2L)
argument in case we need to break the above assumption about where in the stack tracecreate_validation_step()
is called, relative to the "user environment".Misc. considerations
The current step's column and segment are the most important information because those may be dynamic, while others are mostly known and hardcoded. But if we were to expose some other information from the "inside", some other candidates are:
i
value
(though value doesn't always resolve to a string, which is tricky)seg_col
Happy to adopt any suggestions to the set of glue variables we expose in
label
and their names (ex:{.col}
could be{.column}
).Lastly, while this complements the enhancement in #493, the two PRs touch different files/behaviors, so they can be merged in either order.
Related GitHub Issues and PRs
segments
#451Checklist
testthat
unit tests totests/testthat
for any new functionality.