Skip to content

Latest commit



62 lines (32 loc) · 2.79 KB

File metadata and controls

62 lines (32 loc) · 2.79 KB



Begin by collecting the sequences of interest, in this case all animal homeobox proteins:

./ --terms "Homeobox,Homeodomain,ANTP,PRD,LIM,POU,HNF,SINE,TALE,CUT,PROS,ZF,CERS" --fastafile "<file>.csv" --occfile "<file>.fasta"

Next the fastas should be sorted and sepearted into individual HG fasta files:

./ <file>.fasta

Next for the classification step:

Download database of homeodb homeoboxes: - Selecting the appropriate parameters.

cat Fastas/*.fasta > All.fasta

makeblastdb -dbtype 'prot' -in homeodb.fasta

blastp -db 'homeodb' -query 'All.fasta' -evalue 10e-6 -out All.blast

And run InterProScan:

./ -appl 'Panther,Pfam' -i </path/to/All.fasta> -b </path/to/results.interpro>

Run the domain extraction program:

./domainPuller - This one is hardcoded so may need adjustments to fit user directories etc. For future use, this may remain hardcoded for a single command pipeline with full automation.

Finally to actually classify:

./ or to individually classify each $hg.fasta file: ./ --fastafile path/to/$hg.fasta --blastfile All.blast --interprofile new-interpro.tsv ./ Quick checks for formatting as well as classifications

Then run the trees: <FastaDir> <Intermediate File Directory with alignments> To remove duplicate domains after trimming ahead of IQ-TREE. (Not necessary in most cases).

Final classification step:

./ <IntermediateFiles> <classificationTable.tsv> With directory containing all inferred trees in Newick format, and classificationTable.tsv as current classification log as verbose output from

Additional Extra:

sh <speciesList> This script replaces any 4-letter species code as used in the pipeline with the species name for publication. Takes a speciesList.tsv as input.

Further graphical analyses

./ TreeFiles > hbxCount.csv To produce a tablet of occupancy for each species and homeobox family.

./ hbxCount.csv > hbxCountMelt.csv To produce an easily parseable file for the rest of the display results, such as the R scripts and the following analyses.

./ hbxCountMelt.csv > hbxOrigins.csv To produce a list of homeobox families and the last shared ancestor within animals or before first splitting animals.

./ hbxCountMelt.csv > hbxLossGain.csv To produce a list of homeobox reduction and expansion for each animal clade/node.

./ <classifiedTable.tsv> Takes produced classification log table from to produce a table of gene evidence for each species and homeobox family.