Welcome to pointblank Discussions! #244
Replies: 3 comments 2 replies
-
I'm just getting familiar with pointblank and I'm very excited about it! Google BigQuery is one core part of my data pipeline. I don't see any mention of adding BigQuery in the (very nice) list of upcoming work, so I assume support for it is far off if at all. I'm curious whether there is any reason it would be especially difficult to support BigQuery, or if it's just a low priority for other reasons. Looking forward to using pointblank in our nascent data validation steps! |
Beta Was this translation helpful? Give feedback.
-
Ah, great to know that some functions might work as they are. I'll test
them out and check on the four you mentioned that may be an issue -- I
should be getting back to working on validation a few weeks from now. Will
report back!
<http://www.theloomaproject.com/>
Elaine McVey
VP of Data Science
M 919.272.8013 E [email protected]
<https://vimeo.com/theloomaproject>
<https://www.instagram.com/loomaproject/>
<https://www.linkedin.com/in/eamcvey>
…On Mon, Jan 4, 2021 at 3:07 PM Richard Iannone ***@***.***> wrote:
Hi Elaine, thanks for getting this discussion going! I'd love to get
BigQuery 'verified' as working. My main difficulty was/is that testing
against databases is hard, mainly because of access. My guess right now is
that running pointblank against BigQuery might be okay for a lot of the
validation functions. Some of the ones where it might not work so well are
col_vals_regex(), col_vals_increasing(), col_vals_decreasing(), and
rows_duplicated().
Would you be able to tentatively test out pointblank on a BigQuery table?
If you could that would be really great. I could provide a table and an R
script that exercises all of the validation functions.
If certain steps don't work, then the col_vals_expr() could provide a
nice workaround. It uses a dplyr expression, translates to SQL, and runs
that as the validation. In the future, I also want to include a
col_vals_sql() function where you send SQL for the validation step.
Knowing that you need BigQuery to work, I could prioritize this bit of
work (I'll create an Issue). I was pretty happy to find out that pointblank
works pretty well (as far as I can tell w/o testing) on Snowflake (
https://dev.solita.fi/2020/12/16/data-quality-with-r.html). So there's
hope for BigQuery!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/rich-iannone/pointblank/discussions/244#discussioncomment-261039>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANOXHWVV62JE6AZYEPZZBU3SYINXTANCNFSM4USX7WNA>
.
|
Beta Was this translation helpful? Give feedback.
-
Hi Rich, Could a data entry package like {DataEditR} lean on {pointblank} as a dependency to do real-time data validation, and reject invalid values at the point of data entry? |
Beta Was this translation helpful? Give feedback.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
Beta Was this translation helpful? Give feedback.
All reactions