Skip to content

cidm-ph/STECode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STECode_Brand_large_crop

STECode

A pipeline for deployment of a Shiga Toxin-Producing Escherichia coli Virulence Barcode.

The STEC virulence barcode made up of 12 digits in the format of “XX-XX-XX-XX-XX-XX” reflecting their STEC virulence makeup. Where each set of two digits represented a particular virulence factor.

Installation

PIP (Recommended)

pip install git+https://github.com/cidm-ph/STECode

GITHUB

git clone https://github.com/cidm-ph/STECode.git

Then initialise the abricate database with

abricate --setupdb --datadir path/to/stecode/database

Usage

Example usage

stecode --R1 $PATH/$R1.fq.gz --R2 $PATH/$R2.fq.gz --name $FILENAME --outdir $OUTDIR

FLAGS

--outdir, -o [PATH]             optional folder to write output files to
--threads, -t [INT]             specify number of threads used (default = 4)
--R1 [PATH]                     R1 fastq of sample (can be gzipped files)
--R2 [PATH]                     R2 fastq of sample (can be gzipped files)
--fasta, -f [PATH]              optional fasta file which will skip SKESA, can be used in conjunction with --longread.              
--longread, -l                  turns on long read mode.
--name, -n [STR]                name of the file you wish it to be [REQUIRED!]
--version, -v                   print version

SKESA Genome Assembly is the longest portion of this pipeline, so if you already have a genome assembly you can bypass SKESA by supplying a FASTA file. A FASTA only input can also be performed however, the second 'XX' will not show isogenic stx genes.

Output

A few files are coalesced from mapping and abricate into a virulence barcode.

The first set of two digits represented the presence (to the subtype level) or absence of the eae gene. The next set of two digits represented inference of possible multiple, isogenic stx genes not assembled via short read sequencing. The last four sets of 2-mers each reflected the presence (to the subtype level) or absence of stx. This representation allowed up to four different stx operons to be captured, which is currently the maximum number observed both in vitro and in isolates.

The result of your barcode will appear on the console, log file and its own file (--name_virbarcode_YYYYMMDD.tab).

If a discrepancy between the mapping and abricate is found the program will stop and tell you to look at the raw output files. Raw output files that are most useful include:

  • sfindAbricate
  • eaesubtype
  • targetstx

Dependencies

Associated Citations

Sim, E. M., Kim, R., Gall, M., Arnott, A., Howard, P., Valcanis, M., . . . Sintchenko, V. (2021). Added value of genomic surveillance of virulence factors in shiga toxin-producing escherichia coli in New South Wales, Australia. Frontiers in Microbiology, 12. doi:10.3389/fmicb.2021.713724

Licence

Copyright (C) 2022 Western Sydney Local Health District, NSW Health

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.