Skip to content

Improving the quality AlphaFold predicted protein complex structure by MSA optimization

License

Notifications You must be signed in to change notification settings

ntnn19/Tom_Topf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Improving the quality AlphaFold predicted gH/gL/gD complex structure by MSA optimization

IMPORTANT: Other parameters also affect the quality of the resulting structures (TBD):

  • Number of recycles
  • Random seeds
  • The goal of this project is obtain better AlphaFold models for gH/gL/gD complex, w.r.t. the following scores:

    • ipTM score
    • pTM score
    • PAE score
    • pLDDT score

MSA optimization strategies

Strategies implemented thus far:

Default strategy

graph LR;
    gH/gL/gD_multi_sequence_fasta --> id1["colabfold_search
    Default params"] --> colabfold_batch --> Output;
Loading

Tom's Strategy

graph LR;
  id1["`**DB:** Orthoherpesviridae_WGS`"]  --> jackhmmer --> a3m2multi.sh --> colabfold_batch --> Output;
  id2["`**Query:** gH/gL/gD_multi_sequence_fasta`"] --> jackhmmer
Loading

Strategy #2 - Variable MSA depth using colabfold_search

graph LR;
    gH/gL/gD_multi_sequence_fasta --> id1["colabfold_search"] --> id2["colabfold_batch
    max_msa = 16:32, 32:64, 64:128, 256:512, 512:1024"] --> Output;
Loading

Strategy #3 - MSAs using WGS sequences at various levels of the taxonomic hierarchy

graph LR;
    id1["`**DB:**  Heunggongvirae(kingdom)/Herpesvirales(order)`"]  --> jackhmmer --> a3m2multi.sh --> colabfold_batch --> Output;
    id2["`**Query:** gH/gL/gD_multi_sequence_fasta`"] --> jackhmmer
Loading

Strategy #4 - MAFFT + uniref90/WGS + GUIDANCE

graph LR;
  id1["`**DB:** uniref100`"]  --> jackhmmer --> a3m2multi.sh --> colabfold_batch --> Output;
  id2["`**Query:** gH/gL/gD_multi_sequence_fasta`"] --> jackhmmer
Loading

Strategy #5 - MULTICOM3

graph LR;
  id2["`**Query:** gH/gL/gD_multi_sequence_fasta`"]  --> MULTICOM3 --> Output;
  
Loading

INSTRUCTIONS

Clone this repository

git clone https://github.com/ntnn19/Tom_Topf.git

Prerequisits

Pipeline Setup

Mamba (Manual)

  1. This workflow can be easily setup manually with the given environment file. Install Snakemake and dependencies using the command: mamba env create -f environment.yml
  2. Then activate the newly created environment with: mamba activate hsv-1
  3. Execute the pipeline with: ./run_workflow.sh <COLABFOLD_WEIGHTS_DIR> E.g.: ./run_workflow.sh /path/to/download_dir Note: The download directory <COLABFOLD_WEIGHTS_DIR> will be created automatically if it does not exist. It should not be a subdirectory in this repository directory. Please specify an absolute path

Workflow

About

Improving the quality AlphaFold predicted protein complex structure by MSA optimization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published