Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: basic support for SQL SELECT -> ExtendedExpression #55

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

amol-
Copy link
Contributor

@amol- amol- commented Apr 17, 2024

This allows to compile basic SQL SELECT statements to ExtendedExpressions,
the goal is to handle pyarrow Dataset support for ExtendedExpressions
in Projections and Filtering of Datasets.

This can act as a starting for future support of more advanced features,
like subqueries and entire Substrait Plans,
for the moment the goal is just to support what pyarrow.Expressions
can represent

Copy link

ACTION NEEDED

Substrait follows the Conventional Commits
specification
for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@amol- amol- changed the title Basic support for SQL SELECT -> ExtendedExpression feat: Basic support for SQL SELECT -> ExtendedExpression Apr 17, 2024
@amol- amol- force-pushed the sqlglot-extended-exprs branch from 80738d1 to b70ef4e Compare April 17, 2024 14:21
@amol-
Copy link
Contributor Author

amol- commented Apr 17, 2024

I just imported this from an experiment, it's far from being done. I need to refactor and further improve it.

Here is a reference to the kind of operations I'm aiming at supporting: https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Expression.html#pyarrow.dataset.Expression

  • Compare fields and literals with <, <=, ==, >=, >
  • Combine expressions using logical and, logical or and logical not
  • Support is_nan
  • Support is_valid (IS NULL, IS NOT NULL)
  • Support is_in (list of literals)
  • Handle custom types in ExtendedExpressions schema (arrow declares its own types)

@amol- amol- changed the title feat: Basic support for SQL SELECT -> ExtendedExpression feat: basic support for SQL SELECT -> ExtendedExpression Apr 18, 2024
@amol- amol- force-pushed the sqlglot-extended-exprs branch from a727475 to 3c9f6eb Compare May 8, 2024 16:55
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@amol-
Copy link
Contributor Author

amol- commented Nov 30, 2024

The pyarrow pull request related to this one to execute queries on arrow datasets via ExtendedExpressions was merged, so I'll try to recover work on this one as soon as I can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants