-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: add test for impure function correlation behavior #9014
base: main
Are you sure you want to change the base?
Conversation
0992a3f
to
48d4a51
Compare
oooh, these CI failures are revealing differences in the backend behaviors... wheeee into the 🐰 🕳️ we go! |
48d4a51
to
1fea07f
Compare
I can think of two reasons a user might care about the semantics here, and I hope we can support both of their needs:
So I think this means that we as ibis authors can't assume what the goals of the user is, and how many times they want an expression executed. Therefore, we shouldn't do any clever rewrites or mergings of selects. I think we need to keep a more 1:1 correspondence between what the user writes and the SQL we produce: Every time the user does a .select(), .mutate(), etc, (except for simple column renamings, and maybe a few other cases) that leads to exactly one more IDK, what do you think of this train of reasoning? I think I'm fairly convinced that those two use cases are the requirements for success, but perhaps there is a different/better way of accomplishing that goal |
5d9a1ec
to
21b180c
Compare
@NickCrews please rebase to test with #9023 change included |
@NickCrews My only objection would be that
Is not something we can enforce even if we never merge any select statements. This kind of guarantee is at the level of the query engine. |
@cpcloud yup you are right with guarantees, "suggesting" to the backend is the best we can do. What do you think in general of my proposal of "one CTE per .select"? I'm not sure if you're skeptical of the whole thing or just the performance claims... Thanks! |
I'm trying to decide what is higher priority: An implementation that is faster 90% of the time, but does clever things and therefore isn't able to be tuned by the user that 10% of the time they need it Vs An implementation that is a bit slower in the majority of cases, but is always fine tunable to get the perf you need in the edge cases. |
7700941
to
d14384c
Compare
slowly going through the backends and adding the correct marks for each kind of failure... |
ee0ae0c
to
37ae5dd
Compare
Anything I can do here to help move this forward? |
@NickCrews Can you write a docstring for each of these tests explaining what's being tested in each? I know that's atypical, but this is a very hairy problem with lots of specifics that are important to understand, and words like "impure" and "correlation" need to be precisely defined so that everyone is on the same page about exactly what is being tested. |
I think that is a good idea. Will do. |
ee68083
to
3834829
Compare
oops, I just wrote the notes as comments, not docstrings, I assume that is still OK? I think this is ready to review now, thanks! |
Related to ibis-project#8921, trying to write down exactly what the expected behavior is.
0d2406b
to
552c2cf
Compare
It seems like this behavior is intentional across our backends, and that writing a query as a CTE in no way guarantees how many times it will be evaluated. It could be one, or it could be as many as there are references to the CTE. |
Darn it. Here are specifically which backends. Looks like DuckDB is aware of this, and they want to fix it, but not currently feasible for them to fix. |
ughh, what do we want to do? In order to get the self-join behavior that the user wants, I think the backend would have to materialize a temporary table. But I really don't think we want to do that automatically for them. So the best we could do is to detect this happening, and then error? That sounds quite hairy. |
I think the best we can do is to recommend calling |
93a42fa
to
e09d66d
Compare
e09d66d
to
8b51625
Compare
Do we do any sort of check to ensure people don't footgun themselves, or do we only just let them get weird results and then have to start searching the issues/docs? I can't think of where this should go in the docs, maybe we want a "common pitfalls" or "troubleshooting" page? I'm not sure if there are other footguns like this that we should call out that would also fit in there |
We don't have a check for this and I don't think it's worth adding because of the complexity of such a check.
There are almost certainly are, but let's try to stay focused on this issue before moving on to other things. |
For when someone finds these tests in 6 months, they will have some idea of what they can do about them, and where to go looking next.
OK, sounds good to me to just get flink passing and then get this PR merged and call it a day, at least the behavior is written down somewhere. How does this sound as a plan? I added a few comments to the tests so someone sees what our conclusions were. |
Related to #8921,
trying to write down exactly what the expected behavior is.
I figure we can use this PR to hash out exaclty what we want the semantics to be, and then the other discussions might be easier because our goal is written down precisely somewhere. Please let me know if you agree or disagree with this behavior, or if there are other tests we should add.
Need to fix the UDF test case in a followup. Also wasn't sure where to put these tests, I put them in their own little file but if you point me elsewhere Iwill move them.