Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a reason SEPP would need HMMER 3.1b2 specifically ? #76

Open
ppericard opened this issue Dec 10, 2019 · 23 comments
Open

Is there a reason SEPP would need HMMER 3.1b2 specifically ? #76

ppericard opened this issue Dec 10, 2019 · 23 comments

Comments

@ppericard
Copy link

Hi,
SEPP is now a dependency for installing the latest version of QIIME2 via conda (v2019.10, https://data.qiime2.org/distro/core/qiime2-2019.10-py36-linux-conda.yml).

The bioconda recipe for SEPP 4.3.10 (https://github.com/bioconda/bioconda-recipes/blob/master/recipes/sepp/meta.yaml) calls for a very specific dependency of HMMER (v3.1b2). This, in turn, makes incompatible some QIIME2 external plugins like ITSxpress which bioconda recipe calls for hmmer>=3.1. This makes ITSxpress incompatible with the latest version of QIIME2.

I already opened an issue on the ITSxpress repository to ask if they could lower a bit their requirements for HMMER (USDA-ARS-GBRU/itsxpress#16). But another way to deal with the problem would be to relax the requirements in the SEPP recipe to something like hmmer>=3.1b2.
Thus my question. Is there a reason why SEPP would need such a specific version of HMMER or is it worth considering relaxing those constraints in the bioconda recipe?

Maybe @sjanssen2 could shed some light on this?

Thanks in advance,
Pierre

@sjanssen2
Copy link
Contributor

Is it me, or is something broken in the latest hmmer bioconda package. When building with circleci.com I get errors like

07:55:52 BIOCONDA INFO (OUT) SafetyError: The package for hmmer located at /opt/conda/pkgs/hmmer-3.2.1-he1b5a44_2
07:55:52 BIOCONDA INFO (OUT) appears to be corrupted. The path 'bin/hmmalign'
07:55:52 BIOCONDA INFO (OUT) has an incorrect size.
07:55:52 BIOCONDA INFO (OUT)   reported size: 454688 bytes
07:55:52 BIOCONDA INFO (OUT)   actual size: 1327156 bytes

I will further investigate, but maybe the hmmer dependency shouldn't be too relaxed.

@ppericard
Copy link
Author

I've been using the latest version of HMMER on bioconda without any pb, but I'm definitely not using all features.
Still, in your travis job from the pull request #77, it seems you're still using the same version of hmmer (3.1b2) but the build fails (https://travis-ci.org/smirarab/sepp/jobs/623557489):

$ hmmsearch -h

# hmmsearch :: search profile(s) against a sequence database

# HMMER 3.1b2 (February 2015); http://hmmer.org/

# Copyright (C) 2015 Howard Hughes Medical Institute.

@smirarab
Copy link
Owner

smirarab commented Dec 12, 2019 via email

@ppericard
Copy link
Author

HMMER is not a requirement of the QIIME2 conda recipe, but it is a requirement of the SEPP bioconda recipe. Therefore, by making HMMER 3.1b2 a requirement of the SEPP recipe it makes it a requirement for all the other tools and plugins of QIIME2 (that need to be installed in the same conda environment).

I understand that you might need to set a specific version of a dependency for your tool to work, but the general policy of bioconda recipe is to try to be as relaxed as possible with the dependencies in order to prevent this type of incompatibility. And in this particular case, I'm not sure the output formating of HMMER would change much between the v3.1b2 and the v3.1.

However, if SEPP really bundles a version of HMMER inside itself, then the SEPP bioconda recipe shouldn't even need to specify HMMER as a dependency which would solve the problem.

Thank you anyway for trying to solve the issue.

@sjanssen2
Copy link
Contributor

In order to have a lean package for bioconda, I decided to ignore the bundled binaries of HMMer and preferred to add them as dependencies.

@ppericard: As a quick workaround, you could create two qiime2-2019.10 environments. In one, you remove the q2-fragment-insertion and thus SEPP package and thus get rid of the hmmer dependency. Then, you should be able to install your ITSxpress plugin. The other (original) environment should give you all functionality of SEPP.
This procedure might give us enough time to thoroughly test if we can relax the HMMer dependency for future SEPP releases.

@smirarab
Copy link
Owner

smirarab commented Dec 17, 2019 via email

@smirarab
Copy link
Owner

Revisiting this issue. Stefan, should we simply test HMMER v3.1? It seems that Pierre is simply asking to change from v3.1b2 to v3.1. That may be completely harmless.

@sjanssen2
Copy link
Contributor

I tried that some time ago (cf. #77), but it didn't work out of the box and I could not find the time to debug.

@smirarab
Copy link
Owner

smirarab commented Mar 10, 2021 via email

@smirarab
Copy link
Owner

One more thing. I just tested SEPP with HMMER 3.3.2 and it seemed to work perfectly fine in the one test case I ran (locally not on conda). If this is continuing to pose a problem, I'd be happy to test this on more data to make sure the change of version doesn't change results; we can then relax the requirement.

@sjanssen2
Copy link
Contributor

I think what we should aim for is a relaxed dependency like - hmmer >=3. Unfortunately, I don't find a single file on the hmmer page listing the version changes. Not sure if / when they made changes to the output format (not to mention changes to the output content) that might break SEPP.
Thus, I fear we need to iterate through all versions with sufficient test data and check if our tests pass.
I am also not 100% sure if conda understands the somehow inconsistent version number schema of hmmer: b and rc infixes :-/

@Sann5
Copy link

Sann5 commented Jun 7, 2024

Hello @sjanssen2 and @smirarab. I would like to push for the dependency relaxation of HMMER. Please correct me if my wrong but it seems to me that the commands that sepp borrows from HMMER are hmmbuild, hmmsearch, and hmmalign. The inputs/outputs to these are the following:

  • hmmalign [options] hmmfile seqfile -> alignment
  • hmmbuild [options] hmmfile msafile -> hmmfile
  • hmmsearch [options] hmmfile seqdb -> ranked_lists

We are concerned that either of these input/output formats has changed in a newer version of hmmer and therefore would break something in sepp if we update it. Let's walk through the formats underlying each of these inputs/outputs.

  • hmmfile: The format of hmm files has changed over time but they also update the hmmer parser such that it's backward compatible, i.e. new hmmer can take old hmm files as input. Page 210 HMMER documentation
  • ranked_lists: The format of this output is controllable with the options of the command, but its generally a tabular format separated by spaces. No mention of change in the release notes since the 3.1b2 release.
  • alignment: The default is stockholm alignment format but other formats can be chosen using the options (a2m, afa, psiblast, clustal, phylip). No mention of change in the release notes since the 3.1b2 release.
  • seqfile, seqdb and msafile: Input files for HMMER include unaligned sequence files and multiple sequence alignment files. HMMER’s preferred alignment file format is Stockholm format. HMMER can read several other sequence and alignment file formats. By default, it autodetects what format an input file is in. Accepted unaligned sequence file formats include fasta, uniprot, genbank, ddbj, and embl. Accepted multiple alignment file formats include stockholm, afa (i.e. aligned FASTA), clustal, clustallike (MUSCLE, etc.), a2m, phylip (interleaved), phylips (sequential), psiblast, and selex. Page 30 HMMER documentation. No mention of change in the release notes since the 3.1b2 release.

So it seems to me that it would be safe to use a newer hmmer version. To make sure I will run a couple of tests on these commands, once with v3.1b2 and another with v3.4. I'll parse the outputs to make sure they are the same. If it all goes smoothly I'll run sepp with hmmer 3.4 and check if behavior is as expected. Then we can proceed with bioconda/bioconda-recipes#48294 @sjanssen2?

@sjanssen2
Copy link
Contributor

Many thanks @Sann5 for researching all these details. Let's cross fingers that your regression test does not find incompatibilities. Would be amazing to lift the pinning!

@Sann5
Copy link

Sann5 commented Jun 11, 2024

Update ⬆️

So it the output format of all 3 programs is the same. However the profiles that are generated with version 3.1b2 are different from the profiles generated by the 3.4 version in terms of the values of the estimated parameters. This might be due to changes in the sampling algorithm that hmmer uses to estimate these parameters. However I don't see how this would be a problem in sepp.

I'm still to test sep with both versions. I'll let you know how it goes.

@Sann5
Copy link

Sann5 commented Jun 12, 2024

So... 🥁 🥁 🥁

I made two conda environments and installed sepp via conda. Then in one I removed hmmer and installed manually the 3.4 version, compiling it from the source and placing the executables in the expected path. Then I proceed to carry out the sepp tutorial, specifically the first 3 commands (1, 2, and 3).

For both environments, all commands were completed without errors 🥳 ✅ . Then I proceeded to compare the outputs using the command line tool diff. As expected since hmmer 3.1b2 and 3.4 generate different profiles, the results of sepp are also different. However, they both appear to be correctly formatted.

If you wish I can share the code and the outputs. Is this enough evidence to relax the hmmer dependency @sjanssen2 @smirarab?

@Sann5
Copy link

Sann5 commented Jun 24, 2024

@sjanssen2 @smirarab?

@gregcaporaso
Copy link

gregcaporaso commented Jul 8, 2024

Hi all, Any updates on this? We're in a jam in the QIIME 2 ecosystem as we have some plugin developers who would like to use the most recent version of HMMER (3.4) for plugins that are installed in the same QIIME 2 distribution as q2-fragment-insertion/SEPP. This very specific requirement is preventing that.

@sjanssen2
Copy link
Contributor

Yes there is thanks to @Sann5 :-) See: bioconda/bioconda-recipes#48294 However, we are waiting for @smirarab response since approx. a month

@gregcaporaso
Copy link

Hi @smirarab, Do you happen to have an ETA on this and the Py 3.10 updates (#136)? I don't mean to pester you - we're just hoping to upgrade the HMMER and Python dependency versions for QIIME 2 2024.10 (scheduled for 2 October 2024), and we'd need some time for testing and updating things on our end.

@smirarab
Copy link
Owner

@Sann5 @sjanssen2 Based on Stefan's work, it seems fine to live with the changed results, as long as we bump the major version.

@smirarab
Copy link
Owner

The solutions I adopted is as follows.

  • SEPP can be run with any version of HMMER, with the understanding that the results are not always compatible across versions of HMMERs. All the tests we have run show changes are relatively minor.
  • Users should report not only the version of SEPP used but also the version of HMMER used.
  • SEPP logs the version of both SEPP and HMMER used in the latest version.

These changes are done in 6719eea

@sjanssen2 let me know when you are done with the bioconda side and I can close the issue.

@sjanssen2
Copy link
Contributor

Hi @smirarab could you please take a look at smirarab/pasta#70

@smirarab
Copy link
Owner

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants