Name	Name	Last commit message	Last commit date
parent directory ..
cromwell_launcher	cromwell_launcher
tools	tools
workflows	workflows
README.md	README.md

Run a WDL workflow

Overview

This example demonstrates running a multi-stage workflow on Google Cloud Platform.

The workflow is launched with the Google Genomics Pipelines API.
The workflow is defined using the Broad Institute's Workflow Definition Language (WDL).
The workflow stages are orchestrated by the Broad Institute's Cromwell.

When submitted using the Pipelines API, the workflow runs on multiple Google Compute Engine virtual machines. First a master node is created for Cromwell, and then Cromwell submits each stage of the workflow as one or more separate pipelines.

Execution of a running Pipeline proceeds as:

Create Compute Engine virtual machine
On the VM, in a Docker container, execute wdl_runner.py

a. Run Cromwell (server)

b. Submit workflow, inputs, and options to Cromwell server

c. Poll for completion as Cromwell executes:
```
 1) Call pipelines.run() to execute call 1
 2) Poll for completion of call 1
 3) Call pipelines.run() to execute call 2
 4) Poll for completion of call 2
 <etc. until all WDL "calls" complete>
```
d. Copy workflow metadata to output path

e. Copy workflow outputs to output path
Destroy Compute Engine Virtual machine

Setup Overview

Code packaging for the Pipelines API is done through Docker images. The instructions provided here explain how to create your own Docker image, although a copy of this Docker image has already been built and made available by the Broad Institute.

Code summary

The code in the wdl_runner Docker image includes:

OpenJDK 8 runtime engine (JRE)
Python 2.7 interpreter
Cromwell release 26
Python and shell scripts from this repository

Take a look at the Dockerfile for full details.

(0) Prerequisites

Clone or fork this repository.
Enable the Genomics, Cloud Storage, and Compute Engine APIs on a new or existing Google Cloud Project using the Cloud Console
Follow the Google Genomics getting started instructions to install and authorize the Google Cloud SDK.
Follow the Cloud Storage instructions for Creating Storage Buckets to create a bucket for workflow output and logging
If you plan to create your own Docker images, then install docker

(1) Create and stage the wdl_runner Docker image

If you are going to use the published version of the docker image, then skip this step.

Every Google Cloud project provides a private repository for saving and serving Docker images called the Google Container Registry.

The following instructions allow you to stage a Docker image in your project's Container Registry with all necessary code for orchestrating your workflow.

(1a) Create the Docker image.

git clone https://github.com/googlegenomics/pipelines-api-examples.git
cd pipelines-api-examples/wdl_runner/
docker build -t ${USER}/wdl_runner ./cromwell_launcher

(1b) Push the Docker image to a repository.

In this example, we push the container to Google Container Registry via the following commands:

docker tag ${USER}/wdl_runner gcr.io/YOUR-PROJECT-ID/wdl_runner
gcloud docker -- push gcr.io/YOUR-PROJECT-ID/wdl_runner

Replace YOUR-PROJECT-ID with your project ID.

(2) Run the sample workflow in the cloud

The file ./workflows/wdl_pipeline.yaml defines a pipeline for running WDL workflows. By default, it uses the docker image built by the Broad Institute from this repository:

docker:
  imageName: gcr.io/broad-dsde-outreach/wdl_runner:<datestamp>

If you have built your own Docker image, then change the imageName:

docker:
  imageName: gcr.io/YOUR-PROJECT-ID/wdl_runner

Replace YOUR-PROJECT-ID with your project ID.

Run the following command:

gcloud \
  alpha genomics pipelines run \
  --pipeline-file workflows/wdl_pipeline.yaml \
  --zones us-central1-f \
  --logging gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/logging \
  --inputs-from-file WDL=workflows/vcf_chr_count/vcf_chr_count.wdl \
  --inputs-from-file WORKFLOW_INPUTS=workflows/vcf_chr_count/vcf_chr_count.sample.inputs.json \
  --inputs-from-file WORKFLOW_OPTIONS=workflows/common/basic.jes.us.options.json \
  --inputs WORKSPACE=gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/workspace \
  --inputs OUTPUTS=gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/output

Replace YOUR-BUCKET with a bucket in your project.

The output will be an operation ID for the Pipeline.

(3) Monitor the pipeline operation

This github repo includes a shell script, ./tools/monitor_wdl_pipelines.sh, for monitoring the status of a pipeline launched using wdl_pipeline.yaml.

$ ./tools/monitor_wdl_pipeline.sh YOUR-NEW-OPERATION-ID
Logging: gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/logging
Workspace: gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/workspace
Outputs: gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/output

2016-09-01 09:37:44: operation not complete
No operations logs found.
There are 0 output files
Sleeping 60 seconds

...

2016-09-01 09:40:53: operation not complete
Calls started but not complete:
  call-vcf_split
Sleeping 60 seconds

...

2016-09-01 09:44:02: operation not complete
Operation logs found: 
  YOUR-NEW-OPERATION-ID.log
  YOUR-NEW-OPERATION-ID.log
  YOUR-NEW-OPERATION-ID
Calls (including shards) completed: 1
Calls started but not complete:
  call-vcf_record_count/shard-0
  call-vcf_record_count/shard-1
  call-vcf_record_count/shard-2
Sleeping 60 seconds

...

2016-09-01 09:54:31: operation not complete
Calls (including shards) completed: 4
No calls currently in progress.
  (Transitioning to next stage or copying final output).
Sleeping 60 seconds

2016-09-01 09:55:34: operation not complete
Calls (including shards) completed: 4
Calls started but not complete:
  call-gather
Sleeping 60 seconds

2016-09-01 09:56:37: operation not complete
Calls (including shards) completed: 5
No calls currently in progress.
  (Transitioning to next stage or copying final output).
There are 1 output files
Sleeping 60 seconds

2016-09-01 09:57:40: operation complete
Completed operation status information
  done: true
  metadata:
    events:
    - description: start
      startTime: '2016-09-01T16:38:18.215458712Z'
    - description: pulling-image
      startTime: '2016-09-01T16:38:18.215809129Z'
    - description: localizing-files
      startTime: '2016-09-01T16:38:42.613937060Z'
    - description: running-docker
      startTime: '2016-09-01T16:38:42.613978300Z'
    - description: delocalizing-files
      startTime: '2016-09-01T16:56:42.144127783Z'
    - description: ok
      startTime: '2016-09-01T16:56:43.725128719Z'
  name: operations/YOUR-NEW-OPERATION-ID
  gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/output/output.txt
  gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/output/wdl_run_metadata.json

Preemptions:
  None

(4) Check the results

Check the operation output for a top-level errors field. If none, then the operation should have finished successfully.

(5) Check that the output exists

$ gsutil ls -l gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/output
TOTAL: 2 objects, 13025 bytes (12.72 KiB)
        46  2016-09-01T16:56:40Z  gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/output/output.txt
     15069  2016-09-01T16:56:37Z  gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/output/wdl_run_metadata.json
TOTAL: 2 objects, 15115 bytes (14.76 KiB)

Replace YOUR-BUCKET with a bucket in your project.

(6) Check the output

$ gsutil cat gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/output/output.txt
chrM.vcf 197
chrX.vcf 4598814
chrY.vcf 653100

Replace YOUR-BUCKET with a bucket in your project.

(7) Clean up the intermediate workspace files

When Cromwell runs, per-stage output and other intermediate files are written to the WORKSPACE path you specified in the gcloud command above.

To remove these files, run:

gsutil -m rm gs://YOUR-BUCKET/pipelines-api-examples/wdl_runner/workspace/**

Replace YOUR-BUCKET with a bucket in your project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wdl_runner

wdl_runner

README.md

Run a WDL workflow

Overview

Setup Overview

Code summary

(0) Prerequisites

(1) Create and stage the wdl_runner Docker image

(1a) Create the Docker image.

(1b) Push the Docker image to a repository.

(2) Run the sample workflow in the cloud

Run the following command:

(3) Monitor the pipeline operation

(4) Check the results

(5) Check that the output exists

(6) Check the output

(7) Clean up the intermediate workspace files

Files

wdl_runner

Directory actions

More options

Directory actions

More options

Latest commit

History

wdl_runner

Folders and files

parent directory

README.md

Run a WDL workflow

Overview

Setup Overview

Code summary

(0) Prerequisites

(1) Create and stage the wdl_runner Docker image

(1a) Create the Docker image.

(1b) Push the Docker image to a repository.

(2) Run the sample workflow in the cloud

Run the following command:

(3) Monitor the pipeline operation

(4) Check the results

(5) Check that the output exists

(6) Check the output

(7) Clean up the intermediate workspace files