Project 4: Benchmarks for Bioinformatics Workflow Bake Offs

Abstract

Benchmarks - standardized tests comparing performance, accuracy, and efficiency - are key for evaluating individual tools and composite workflows. In a “Bake Off” setting, they allow for comparisons of candidate tools and workflows for a particular computational task in order to determine the best-performing one. In this BioHackathon project, we will conduct a “Great Bake Off of Bioinformatics Workflows” to assess the effectiveness of existing workflow-level benchmarks and develop further ideas. We will invite BioHackathon participants to share tools and workflows with us, and collect their feedback and further ideas for benchmarks. Initially, the benchmarks will be tested in the proteomics domain due to mature domain annotations and project lead expertise. The participants' areas of expertise will guide the exploration of additional domains.

In recent and ongoing work in ELIXIR Implementation Studies and spin-off projects, we have already developed several rudimentary workflow-level benchmarks for bioinformatics data analysis pipelines, including those automatically composed by the APE (Automatic Pipeline Explorer; https://github.com/sanctuuary/APE) framework. Before deploying these benchmarks for production use, however, their definitions must be aligned with benchmarks at the tool-level and formalized. This process should prioritize benchmarks that are most relevant for users when selecting, comparing, and deploying workflows for daily use. The BioHackathon project will consolidate these efforts by bringing together people with complementary expertise and bridging ongoing ELIXIR efforts to (1) produce a minimum fit-for-purpose set of workflow-specific benchmarks, (2) test and evaluate these in real-world examples and (3) create a repository for continuing the development of these benchmarks beyond the BioHackathon.

More information

The Project builds on a simple set of workflow benchmarks. Users and developers of bioinformatics software, computer scientists and software engineers can all contribute, without expertise in any particular bioinformatics domain. Five to six participants are needed for the Project to succeed.

Short-term, the Project will deliver a draft of defined workflow-level benchmarks, each with examples and defined relationships to existing tool-level benchmarks and standards. We will systematically discuss the different types of workflow-level benchmarks, including both design-time (algorithmic complexity, workflow deployability) and run-time (performance metrics). The leads will ensure all important types of benchmarks are discussed. Draft benchmark definitions and examples will be revisited at the end of the BioHackathon.

Long-term, these benchmarks will be implemented in the Workflomics project, run by the Project Leads, and ongoing Proteomics Implementation Studies. These workflow-level benchmarks will be carefully documented and shared with the community in publications and talks in relevant fora.

Lead(s)

Vedran Kasalica, Magnus Palmblad, Anna-Lena Lamprecht

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
paper		paper
.gitignore		.gitignore
README.md		README.md
paper (2).pdf		paper (2).pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 4: Benchmarks for Bioinformatics Workflow Bake Offs

Abstract

More information

Lead(s)

About

Releases

Packages

Languages

Workflomics/biohackathon_2023

Folders and files

Latest commit

History

Repository files navigation

Project 4: Benchmarks for Bioinformatics Workflow Bake Offs

Abstract

More information

Lead(s)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages