Skip to content

Latest commit

 

History

History
66 lines (46 loc) · 3.08 KB

overview.md

File metadata and controls

66 lines (46 loc) · 3.08 KB

Overview

A collection of scripts for SARS-CoV-2 genomics analysis automation

Our automation solution comes in the form of a collection of independent small scripts powered by the bioblend library for interacting with the Galaxy API and by the workflow execution functionality of the planemo command-line utilities.

Tag-based orchestration of scripts

The actions of all scripts in the collection are controlled and coordinated via a system of Galaxy dataset and history tags that is used to communicate input data availability and the state of the overall analysis progress.

When run together the scripts support a fully automated SARS-CoV-2 analysis pipeline for ARTIC paired-end sequencing data that includes

  • raw sequencing data upload into Galaxy and organization of the data into dataset collections
  • variant calling using our highly sensitive published workflow for variation analysis on ARTIC PE data
  • generation of reports of all identified variants
  • reliable consensus sequence building including soft-masking of questionable sites
  • export of key analysis results - BAM files of aligned reads, VCF files of called variants, FASTA consensus sequences to a user-specific FTP folder for simplified downloading with standard FTP clients.

The full pipeline with all script actions looks like this:

  1. You upload simple text files with download links for your sequencing data into a Galaxy history on a Galaxy server of your choice (yes, all scripts work on any Galaxy server you have a user account on).

    All links in one dataset will be treated as a batch of data and be analyzed together. Add as many datasets as you want to one or more tagged histories and repeated runs of the scripts will process batches one at a time.

  2. You add a history tag recognized by the variation script, which identifies that history as one holding datasets with data download links that should be processed

  3. You arrange alternating runs of the scripts and watch the automated batch-wise analysis of your data live in the Galaxy UI!

Learn more about:

Limitations

The current collection of scripts supports only analysis of ARTIC-amplified paired-end sequencing data as this is by far the most commonly used protocol in large-scale sequencing efforts.

We hope to be able to offer support for other sequencing protocols in future releases. See also the next paragraph :)

Contributions welcome

The current scripts support our COG-UK tracking efforts on usegalaxy.* instances quite well, but we hope to be able to expand the collection based on independent user, i.e. your, feedback and contributions!

Bug reports, ideas, patches, additional scripts - whatever you can provide is very welcome!