Skip to content

Workflomics/biohackathon_2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project 4: Benchmarks for Bioinformatics Workflow Bake Offs

Abstract

Benchmarks - standardized tests comparing performance, accuracy, and efficiency - are key for evaluating individual tools and composite workflows. In a “Bake Off” setting, they allow for comparisons of candidate tools and workflows for a particular computational task in order to determine the best-performing one. In this BioHackathon project, we will conduct a “Great Bake Off of Bioinformatics Workflows” to assess the effectiveness of existing workflow-level benchmarks and develop further ideas. We will invite BioHackathon participants to share tools and workflows with us, and collect their feedback and further ideas for benchmarks. Initially, the benchmarks will be tested in the proteomics domain due to mature domain annotations and project lead expertise. The participants' areas of expertise will guide the exploration of additional domains.

In recent and ongoing work in ELIXIR Implementation Studies and spin-off projects, we have already developed several rudimentary workflow-level benchmarks for bioinformatics data analysis pipelines, including those automatically composed by the APE (Automatic Pipeline Explorer; https://github.com/sanctuuary/APE) framework. Before deploying these benchmarks for production use, however, their definitions must be aligned with benchmarks at the tool-level and formalized. This process should prioritize benchmarks that are most relevant for users when selecting, comparing, and deploying workflows for daily use. The BioHackathon project will consolidate these efforts by bringing together people with complementary expertise and bridging ongoing ELIXIR efforts to (1) produce a minimum fit-for-purpose set of workflow-specific benchmarks, (2) test and evaluate these in real-world examples and (3) create a repository for continuing the development of these benchmarks beyond the BioHackathon.

More information

The Project builds on a simple set of workflow benchmarks. Users and developers of bioinformatics software, computer scientists and software engineers can all contribute, without expertise in any particular bioinformatics domain. Five to six participants are needed for the Project to succeed.

Short-term, the Project will deliver a draft of defined workflow-level benchmarks, each with examples and defined relationships to existing tool-level benchmarks and standards. We will systematically discuss the different types of workflow-level benchmarks, including both design-time (algorithmic complexity, workflow deployability) and run-time (performance metrics). The leads will ensure all important types of benchmarks are discussed. Draft benchmark definitions and examples will be revisited at the end of the BioHackathon.

Long-term, these benchmarks will be implemented in the Workflomics project, run by the Project Leads, and ongoing Proteomics Implementation Studies. These workflow-level benchmarks will be carefully documented and shared with the community in publications and talks in relevant fora.

Lead(s)

Vedran Kasalica, Magnus Palmblad, Anna-Lena Lamprecht

About

Project #4 in BioHackathon 2023 (Barcelona)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages