-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
createtsv
output from foldseek search
does not match the alignments in result2msa
#405
Comments
this looks odd. Do you have data and commands to reproduce it? |
I have both. However, I've tried this on multiple data sets, workflows and versions of foldseek and the error is always similar. So I imagine you may be able to reproduce it on your own. Specifically, I've tried foldseek versions I've tried both on curstom I've tried on Finally, I've tried extracting a3ms from I'll try to send you a minimal data plus commands example later. |
Did you set the Independent, you could also give Foldmason a try https://github.com/steineggerlab/foldmason |
Martin, the The MSAs look good now and the alignment statistics are identical to those within the I've also tried to get the Foldmason is indeed promising, but it requires proper PDB/mCIF files, right? If one only has ProstT5 3Di + aa files, then I think that's not enough for Foldmason, is it? I'm working with viral proteins, so I don't have good Alphafold PDBs. I've been able to modify FAMSA like they did in the Puente-Lelievre preprint to align 3di files with a modified substitution matrix, then replace 3Di chars in the MSA with the original input AAs. It works, but it's not better than default FAMSA. It's not worse either. But I imagine Foldmason would be much better than this hack. |
TL;DR:
Thanks, Martin! |
Yes, cluster databases do not contain alignment information and therefore have no cigar strings. You need to realign them, for example, using the structuralign module with the -a option.
For more documentation regarding Foldseek, check the MMseqs2 Wiki, which explains most of the internals of Foldseek, as Foldseek is built on MMseqs2. |
Foldseek has this game-chaning feature where it can generate structural MSAs!
With the caveat that these are query-centered, meaning they're based on underlying pairwise alignments to your query. Quite acceptable for a lot of downstream applications. However:
Expected Behavior
One would expect that the MSAs generated by
result2msa
should match those given by e.g.createtsv
, or other foldseek alignment output modalities.Current Behavior and steps to reproduce:
They do not match. Here is output from
createtsv
:And here is the output som
result2msa
(using ffindex to fetch the alignment for query number 602, same as above):It's evident that the first alignment is not the same. All statistics including fident, bitscore, and coordinates are different. Let's grep out the headers to check the remaining alignments:
As seen here all of the same targets were found, but none of the alignments are identical to those within the
createtsv
output. Most have worse coverages and even non-significant E-values.Your Environment
Latest foldseek version
0d8d966cfa50b07c5ee83aa9060d795f5ee186a4
Thanks again, Martin, for your amazing work!
The text was updated successfully, but these errors were encountered: