Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freyja demix tool - Custom barcode files does not work due to method to check the file extension submitted (while it is the correct csv format) #6681

Open
dadrasarmin opened this issue Jan 18, 2025 · 1 comment

Comments

@dadrasarmin
Copy link

Hi,

there is a tool called Freyja demix for the Covid community on the Galaxy.

One input of this tool is called Source of UShER barcodes data and it can use an internal file or a user provided CSV file. There is an exmaple of CSV file on Freyja Github. When a Galaxy user runs this tool with Provide a custom barcode file and provides a CSV file in the field UShER barcodes file, the task fails.

After looking for the root of the problem, I noticed that Freyja will check the input file format for barcode by just checking the extension of the filename (here) as follows:

def load_barcodes(barcodes, pathogen, altname):
    locDir = os.path.abspath(os.path.join(os.path.realpath(__file__),
                             os.pardir))
    if barcodes != '':
        if barcodes.endswith('csv'):
            df_barcodes = pd.read_csv(barcodes, index_col=0)
        elif barcodes.endswith('feather'):
            df_barcodes = pd.read_feather(barcodes).set_index('index')
        else:
            raise ValueError('Only csv and feather barcode ' +
                             'formats supported')

When I use Freyja on Galaxy, I get the following error:

	
Traceback (most recent call last):
  File "/usr/local/tools/_conda/envs/mulled-v1-422b12711dac707b326db04a2307413f94850d5c03d7d6affb06577fea704c12/bin/freyja", line 10, in <module>
    sys.exit(cli())
             ^^^^^
  File "/usr/local/tools/_conda/envs/mulled-v1-422b12711dac707b326db04a2307413f94850d5c03d7d6affb06577fea704c12/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/tools/_conda/envs/mulled-v1-422b12711dac707b326db04a2307413f94850d5c03d7d6affb06577fea704c12/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/tools/_conda/envs/mulled-v1-422b12711dac707b326db04a2307413f94850d5c03d7d6affb06577fea704c12/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/tools/_conda/envs/mulled-v1-422b12711dac707b326db04a2307413f94850d5c03d7d6affb06577fea704c12/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/tools/_conda/envs/mulled-v1-422b12711dac707b326db04a2307413f94850d5c03d7d6affb06577fea704c12/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/tools/_conda/envs/mulled-v1-422b12711dac707b326db04a2307413f94850d5c03d7d6affb06577fea704c12/lib/python3.11/site-packages/freyja/_cli.py", line 103, in demix
    df_barcodes = load_barcodes(barcodes, pathogen, altname)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/tools/_conda/envs/mulled-v1-422b12711dac707b326db04a2307413f94850d5c03d7d6affb06577fea704c12/lib/python3.11/site-packages/freyja/utils.py", line 39, in load_barcodes
    raise ValueError('Only csv and feather barcode ' +
ValueError: Only csv and feather barcode formats supported

I think the problem can be found in the command line metadata section of the job:

ln -s '/data/dnb10/galaxy_db/files/f/3/7/dataset_f37e9015-0aa2-4377-b882-fd3531fca6a8.dat' Variant_call_PV5391-SC2.tabular && freyja demix 'Variant_call_PV5391-SC2.tabular' '/data/dnb10/galaxy_db/files/1/2/8/dataset_1280b3b4-4760-4073-a98f-0b19d2cfaae1.dat'    --barcodes '/data/dnb10/galaxy_db/files/9/9/b/dataset_99b90e78-21e6-438a-97d8-09c9b74e390b.dat'  --covcut 5 --output abundances_raw.tsv && sed 's/Variant_call_PV5391-SC2.tabular/Variant_call_PV5391-SC2/' abundances_raw.tsv > abundances.tsv

Here, we see that the file name does not end with "csv" but rather "dat" (--barcodes '/data/dnb10/galaxy_db/files/9/9/b/dataset_99b90e78-21e6-438a-97d8-09c9b74e390b.dat').

I think it is "easy" to solve this issue but I am not proficient myself to do it. I appreciate your help in advance.

Best,
Armin

@bernt-matthias
Copy link
Contributor

Thanks for the report and analysis:

Would you be able to open a PR here.

I guess one needs to use a hardcoded filename with csv extension here, replacing ${usher_update_option.usher_barcodes:

--barcodes '${usher_update_option.usher_barcodes}'

And add another ln here (linking ${usher_update_option.usher_barcodes to the hardcoded filename (using the same if block as in the above link):

ln -s '$variants_in' $in_file &&

Ideally we would also add a test case covering this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants