-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PartitionDataset
Caching Support
#974
Comments
There's some discussion of this in #928. I've written a couple custom datasets for this use case and for parallel processing of partitions, attached here in case they're helpful. |
I think they're different. I am okay with sequential execution but I wanted to support continue where it is left off. Ideally it's easy to hack but seemed like a nice feature to have in kedro |
Try the third RobustPartitionedDataset? It's patterned off of the builtin incremental dataset to address some edge cases. You can set it up like a regular PartitionedDataset, with the additional parameter mydataset:
type: <my-project>.datasets.robust_partitioned_dataset.RobustPartitionedDataset
path: ...
dataset:
type ...
behavior: complete_missing https://gist.github.com/fgassert/c6c9a87c47d2eaffd30d3f72b0ff675a#file-robust_partitioned_dataset-py |
Thanks for the pointers 🙌 As I said, I wasn't looking for a custom solution as this could be done with few line of changes in the original code. Issue is opened so that this could (potentially) be brought to core kedro not as a custom dataset solution. |
Description
I have a node which returns
dict[str, Callable]
for kedro to save my partitioned data. I've often had cases where it was failing mid way due to edge case i didn't cover and execution starts from all over again.Context
I would need this to speed up experimentation in kedro and reduce unnecessary costs which may occur by re-running the node.
Possible Implementation
Adding a new parameter to
PartitionDataset
to support skipping already existing files. Something likeuse_cache: True
Possible Alternatives
I can def inherit the class and implement this but i thought it would be useful feature to have it in the core code.
The text was updated successfully, but these errors were encountered: