Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildcards cannot be expanded in local filepaths #966

Open
mgreshake opened this issue Dec 12, 2024 · 0 comments
Open

Wildcards cannot be expanded in local filepaths #966

mgreshake opened this issue Dec 12, 2024 · 0 comments
Labels
Community Issue/PR opened by the open-source community

Comments

@mgreshake
Copy link

Description

For kedro-datasets plugin, it is not possible to resolve wildcards in local filepaths. For example, I want to read an arbitrary JSON file my_file.json by passing data/01_raw/*.json as filepath and setting expand parameter for fs_args to true. For cloud storage, this works fine, but if a local filepath is passed, I get a DatasetError. The reason is that self._storage_options, where fs_args are stored, won't be passed during loading as long as self._protocol is file.

Steps to Reproduce

  1. Create arbitrary JSON file and add the following to your catalog.yml:
my_data_raw:
  type: pandas.JSONDataset
  filepath: data/01_raw/*.json
  fs_args:
    expand: True
  1. Use my_data_raw in your pipeline.

Expected Result

The filepath is resolved properly and no DatasetError is raised.

Actual Result

DatasetError: Failed while loading data from dataset 
JSONDataset(filepath=C:/Users/***/Documents/Projects/***/Code/data-pipeline/data/01_raw/*.json, 
load_args={'orient': records}, protocol=file, save_args={}).
File 
C:/Users/***/Documents/Projects/***/Code/data-pipeline/data/01_raw/*.json does not exist

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 0.19.10
  • Kedro plugin and kedro plugin version used (pip show kedro-airflow): 5.1.0
  • Python version used (python -V): 3.12
  • Operating system and version: Windows 10
@merelcht merelcht added the Community Issue/PR opened by the open-source community label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
None yet
Development

No branches or pull requests

2 participants