Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EMBOSS] Add Needleall tool (v6) and bump version for needle to v6 #6643

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
9a8e6bf
add needleall tool
shiltemann Dec 12, 2024
7da13e4
add macros for EMBOSS version 6
shiltemann Dec 12, 2024
94de954
add scoring matrix parameter
shiltemann Dec 16, 2024
8fd446f
update needle to v6
shiltemann Dec 16, 2024
3cff456
update needle to v6
shiltemann Dec 16, 2024
5f9bd46
update version string
shiltemann Dec 16, 2024
e6ce3ec
use arguments rather than names
shiltemann Dec 16, 2024
dbdbf55
add additional parameters
shiltemann Dec 16, 2024
23f99fc
update help text
shiltemann Dec 16, 2024
8378484
fix duplicate option in select
shiltemann Dec 16, 2024
314cc33
linting fixes
shiltemann Dec 16, 2024
fe8a864
add version command
shiltemann Dec 16, 2024
4a391cc
indent
shiltemann Dec 16, 2024
c8dc429
indent
shiltemann Dec 16, 2024
47902f8
add llimits to gap penalties
shiltemann Dec 17, 2024
4acad15
add profile
shiltemann Dec 17, 2024
fb53aee
typo
shiltemann Dec 17, 2024
097c980
remove invalid select option
shiltemann Dec 17, 2024
52c7d53
change format
shiltemann Dec 17, 2024
1bef670
add more change_format cases and add conditional to tests
shiltemann Dec 17, 2024
799d021
add more output formats
shiltemann Dec 17, 2024
f9a5c90
move output format components to macros
shiltemann Dec 17, 2024
dac2590
add more parameters to macros
shiltemann Dec 17, 2024
1a331ed
move needle tools to own folder, as suite
shiltemann Jan 17, 2025
da90ae9
remove emboss5 macros
shiltemann Jan 17, 2025
3fc5d2f
update to new macros, remove spaces from name
shiltemann Jan 17, 2025
4838def
add test data
shiltemann Jan 17, 2025
f0cd788
undo changes to emboss_5 folder
shiltemann Jan 17, 2025
a824b8c
remove unused test-data
shiltemann Jan 17, 2025
e6f7261
remove code file
shiltemann Jan 17, 2025
328e82a
update shed file
shiltemann Jan 17, 2025
8681eb3
remove unused macros
shiltemann Jan 17, 2025
2fcfb40
tweak help text
shiltemann Jan 17, 2025
5d7ade8
typo
shiltemann Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions tools/emboss/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: emboss
owner: iuc
description: "Galaxy wrappers for EMBOSS6 tools"
categories:
- Sequence Analysis
- Fasta Manipulation
homepage_url: "http://emboss.open-bio.org/"
long_description: |
"The European Molecular Biology Open Software Suite (EMBOSS) is a high quality, well documented package of open source software tools for molecular biology. It includes over 200 applications for molecular sequence analysis and other common tasks in bioinformatics."
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/emboss
type: unrestricted

auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for EMBOSS tool: {{ tool_name }}."
suite:
name: "suite_emboss"
description: "EMBOSS suite of tool for molecular biology"
long_description: |
"The European Molecular Biology Open Software Suite (EMBOSS) is a high
quality, well documented package of open source software tools for
molecular biology. It includes over 200 applications for molecular
sequence analysis and other common tasks in bioinformatics."


134 changes: 134 additions & 0 deletions tools/emboss/emboss_needle.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
<tool id="emboss_needle" name="needle" version="@VERSION@" profile="@PROFILE@">
<description>Needleman-Wunsch global alignment</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="bio_tools" />
<expand macro="requirements" />
<version_command>needle -version</version_command>
<command detect_errors="exit_code"><![CDATA[
needle -asequence '$asequence'
-bsequence '$bsequence'
-outfile '$out_file1'
-gapopen $gapopen
-gapextend $gapextend
-brief $brief
-aformat3 $out_format1
-auto
#if $datafile
-datafile $datafile
#end if
#if $endgap.endweight == 'yes'
-endopen $endgap.endopen
-endextend $endgap.endextend
#end if
]]></command>
<inputs>
<param argument="-asequence" type="data" format="fasta" label="Sequence 1" />
<param argument="-bsequence" type="data" format="fasta" label="Sequence 2" />

<expand macro="scoring_matrix"/>
<expand macro="gap_penalties"/>
<expand macro="endgap_penalties"/>
<expand macro="param_brief"/>

<expand macro="choose_alignment_output_format"/>
</inputs>
<outputs>
<data name="out_file1" format="needle" label="${tool.name} on ${on_string}: alignment output" >
<expand macro="change_alignment_output_format"/>
</data>
</outputs>
<tests>
<test>
<param name="asequence" value="2.fasta"/>
<param name="bsequence" value="1.fasta"/>
<param name="gapopen" value="10"/>
<param name="gapextend" value="0.5"/>
<param name="brief" value="yes"/>
<param name="out_format1" value="score"/>
<output name="out_file1" file="emboss_needle_out.score" ftype="score"/>
</test>
<test><!-- test with fasta output, custom matrix, and endgap penalties -->
<param name="asequence" value="2.fasta"/>
<param name="bsequence" value="1.fasta"/>
<param name="gapopen" value="10"/>
<param name="gapextend" value="0.5"/>
<param name="datafile" value="EPAM30"/>
<conditional name="endgap">
<param name="endweight" value="yes"/>
<param name="endopen" value="13.37"/>
<param name="endextend" value="2.5"/>
</conditional>
<param name="brief" value="yes"/>
<param name="out_format1" value="fasta"/>
<output name="out_file1" file="emboss_needle_out.fasta" ftype="fasta"/>
</test>
</tests>
<help><![CDATA[

needle reads any two sequences of the same type (DNA or protein).

This tool uses the Needleman-Wunsch global alignment algorithm to find the optimum alignment (including gaps) of two sequences when considering their entire length.

- **Optimal alignment:** Dynamic programming methods ensure the optimal global alignment by exploring all possible alignments and choosing the best.

- **The Needleman-Wunsch algorithm** is a member of the class of algorithms that can calculate the best score and alignment in the order of mn steps, (where 'n' and 'm' are the lengths of the two sequences).

- **Gap open penalty:** [10.0 for any sequence] The gap open penalty is the score taken away when a gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences. (Floating point number from 1.0 to 100.0)

- **Gap extension penalty:** [0.5 for any sequence] The gap extension, penalty is added to the standard gap penalty for each base or residue in the gap. This is how long gaps are penalized. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap open penalty to zero (or very low) and using the gap extension penalty to control gap scoring. (Floating point number from 0.0 to 10.0)

You can view the original documentation here_.

.. _here: http://galaxy-iuc.github.io/emboss-5.0-docs/needle.html

-----

**Example**

- Input File::

>hg18_dna range=chrX:151073054-151073136 5'pad=0 3'pad=0 revComp=FALSE strand=? repeatMasking=none
TTTATGTCTATAATCCTTACCAAAAGTTACCTTGGAATAAGAAGAAGTCA
GTAAAAAGAAGGCTGTTGTTCCGTGAAATACTG

- If both Sequence1 and Sequence2 take the above file as input, Gap open penalty equals 10.0, Gap extension penalty equals 0.5, Brief identity and similarity is set to Yes, Output alignment file format is set to SRS pairs, the output file is::

########################################
# Program: needle
# Rundate: Mon Apr 02 2007 14:23:16
# Align_format: srspair
# Report_file: ./database/files/dataset_7.dat
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: hg18_dna
# 2: hg18_dna
# Matrix: EDNAFULL
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 83
# Identity: 83/83 (100.0%)
# Similarity: 83/83 (100.0%)
# Gaps: 0/83 ( 0.0%)
# Score: 415.0
#
#=======================================

hg18_dna 1 TTTATGTCTATAATCCTTACCAAAAGTTACCTTGGAATAAGAAGAAGTCA 50
||||||||||||||||||||||||||||||||||||||||||||||||||
hg18_dna 1 TTTATGTCTATAATCCTTACCAAAAGTTACCTTGGAATAAGAAGAAGTCA 50

hg18_dna 51 GTAAAAAGAAGGCTGTTGTTCCGTGAAATACTG 83
|||||||||||||||||||||||||||||||||
hg18_dna 51 GTAAAAAGAAGGCTGTTGTTCCGTGAAATACTG 83

#---------------------------------------
#---------------------------------------
]]></help>
<expand macro="citations" />
</tool>
97 changes: 97 additions & 0 deletions tools/emboss/emboss_needleall.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
<tool id="emboss_needleall" name="needleall" version="@[email protected]" profile="@PROFILE@">
<description>Many-to-many Needleman-Wunsch global alignment</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="bio_tools" />
<expand macro="requirements" />
<version_command>needleall -version</version_command>
<command detect_errors="exit_code"><![CDATA[
needleall
-asequence '$asequence'
-bsequence '$bsequence'
-outfile '$out_file1'
-gapopen $gapopen
-gapextend $gapextend
-brief $brief
-aformat3 $out_format1
-auto
#if $datafile
-datafile $datafile
#end if
#if $endgap.endweight == 'yes'
-endopen $endgap.endopen
-endextend $endgap.endextend
#end if
-minscore $minscore
]]></command>
<inputs>
<param argument="-asequence" type="data" format="fasta" label="Sequence set 1" />
<param argument="-bsequence" type="data" format="fasta" label="Sequence set 2" />

<expand macro="scoring_matrix"/>
<expand macro="gap_penalties"/>
<expand macro="endgap_penalties"/>
<expand macro="param_brief"/>

<param argument="-minscore" type="float" value="1.0" min="-10.0" max="100.0" label="Minimum alignment score to report an alignment." help=""/>

<expand macro="choose_alignment_output_format"/>
</inputs>
<outputs>
<data name="out_file1" format="needle" label="${tool.name} on ${on_string}: alignment output">
<expand macro="change_alignment_output_format"/>
</data>
</outputs>
<tests>
<test>
<param name="asequence" value="emboss_needleall_input1.fa"/>
<param name="bsequence" value="emboss_needleall_input2.fa"/>
<param name="gapopen" value="10"/>
<param name="gapextend" value="0.5"/>
<param name="brief" value="yes"/>
<param name="out_format1" value="score"/>
<output name="out_file1" file="emboss_needleall_out.score" ftype="score"/>
</test>
<test><!-- test fasta output -->
<param name="asequence" value="emboss_needleall_input1.fa"/>
<param name="bsequence" value="emboss_needleall_input2.fa"/>
<param name="gapopen" value="10"/>
<param name="gapextend" value="0.5"/>
<param name="brief" value="yes"/>
<param name="out_format1" value="fasta"/>
<output name="out_file1" file="emboss_needleall_out.fasta" ftype="fasta"/>
</test>
<test><!-- test with pair output, endgap penalties and custom scoring matrix -->
<param name="asequence" value="emboss_needleall_input1.fa"/>
<param name="bsequence" value="emboss_needleall_input2.fa"/>
<param name="gapopen" value="10"/>
<param name="gapextend" value="0.5"/>
<conditional name="endgap">
<param name="endweight" value="yes"/>
<param name="endopen" value="13.37"/>
<param name="endextend" value="2.5"/>
</conditional>
<param name="brief" value="yes"/>
<param name="datafile" value="EPAM30"/>
<param name="out_format1" value="pair"/>
<output name="out_file1" file="emboss_needleall_out.pair" lines_diff="10" ftype="pair"/>
</test>
</tests>
<help><![CDATA[

needleall reads in two nucleotide or protein sequences inputs. Both can be one or more sequences. All sequences in the first input are aligned to all sequences in the second input.

This tool uses the Needleman-Wunsch global alignment algorithm to find the optimum alignment (including gaps) of two sequences when considering their entire length.

- **Optimal alignment:** Dynamic programming methods ensure the optimal global alignment by exploring all possible alignments and choosing the best.

- **The Needleman-Wunsch algorithm** is a member of the class of algorithms that can calculate the best score and alignment in the order of mn steps, (where 'n' and 'm' are the lengths of the two sequences).

- **Gap open penalty:** [10.0 for any sequence] The gap open penalty is the score taken away when a gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences. (Floating point number from 1.0 to 100.0)

- **Gap extension penalty:** [0.5 for any sequence] The gap extension, penalty is added to the standard gap penalty for each base or residue in the gap. This is how long gaps are penalized. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap open penalty to zero (or very low) and using the gap extension penalty to control gap scoring. (Floating point number from 0.0 to 10.0)

]]></help>
<expand macro="citations" />
</tool>
Loading
Loading