Skip to content

Limit number of records sampled and filter tables #4936

Answered by pmbrull
jayadevanm asked this question in Q&A
Discussion options

You must be logged in to vote

Thanks for reaching out and bigger thanks for your words.

We currently have two ways of computing the data profiling:

  1. Enable it via the metadata ingestion
  2. Run it on its own Profiler Workflow (available in the UI, Airflow SDK, and CLI)

What could be done here is to disable it during the metadata ingestion (we are thinking about removing it from there completely exactly for the same reasons you're giving) and just have separated Profiler Workflows handling that.

The interesting part about the Profiler Workflows is that we're not limited to a single workflow per service. You could deploy multiple workflows, filtering by different sets of tables, and run each of them at a different schedule.

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@pmbrull
Comment options

Answer selected by jayadevanm
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants