Skip to content

Latest commit

 

History

History
426 lines (365 loc) · 13.7 KB

README.md

File metadata and controls

426 lines (365 loc) · 13.7 KB

GA4GH Logo

License: Apache 2.0 release_badge

Branch CI Docs CI Specs Swagger Validator
main Build Status CI Status Swagger Validator
develop Build Status CI Status Swagger Validator

Task Execution Service (TES) API

This repository is the home for the schema of the Task Execution Service (TES) API defined by theCloud Work Stream of the Global Alliance for Genomics and Health (GA4GH). The goal of the API standard is to provide a uniform way to executing batch computing tasks.

GA4GH is an international coalition, formed to enable the sharing and processing of genomic data.

The Cloud Work Stream helps the genomics and health communities take full advantage of modern cloud environments. Our initial focus is on “bringing the algorithms to the data”, by creating standards for defining, sharing, and executing portable workflows.

We work with platform development partners and industry leaders to develop standards that will facilitate interoperability.

What is TES?

The Task Execution Service (TES) API is an effort to define a standardized schema and API for describing batch execution tasks. A task defines a set of commands to run, a set of (Docker) containers to run them in, sets of input and output files, required resources, as well as some other metadata, e.g., for capturing provenance information.

API Definition

See the human-readable reference documentation.

The documentation hosted at https://ga4gh.github.io/task-execution-schemas reflects the latest official API release from the main branch. To explore the documentation from a development branch, append preview/<branch-name>/docs/ to the base URL. For example, to view the documentation for the latest development version of the specification, visit https://ga4gh.github.io/task-execution-schemas/preview/develop/docs/.

You can also examine the specification in the Swagger Editor.

If you want to explore a version from a development branch, please load the corresponding specification file (in openapi/task_execution_service.openapi.yaml) manually into the Swagger Editor.

TES Compliant Implementations

In alignment with GA4GH policies and regulations, security reviews are conducted for each major version release of the API. However, no security guarantees are provided for any implementation of the API, including those linked from this page or the associated documentation. Users are advised to proceed at their own risk and should arrange for a security audit of their application to ensure compliance with relevant regulatory and security standards, particularly when handling personal data.

Client

Server

Compatibility Matrix

Compatibility is assumed based on available documentation and limited tests performed on latest versions of implementations available in December 2020. Information may be outdated.

cwl-tes Cromwell Nextflow
Funnel Compatible Compatible Compatible
TESK Compatible Compatible Compatible
TES Azure Not tested Compatible Compatible

TES Service Examples

The API specification is available here in OpenAPI v3.0.1. Clients may use JSON and REST to communicate with a service implementing the TES API.

Creating a task

Here's an example of a complete task message, defining a task which calculates an MD5 checksum on an input file and uploads the output:

{
  "name": "MD5 example",
  "description": "Task which runs md5sum on the input file.",
  "tags": {
      "custom-tag": "tag-value"
  },
  "inputs": [
      {
          "name": "infile",
          "description": "md5sum input file",
          "url": "/path/to/input_file",
          "path": "/container/input",
          "type": "FILE"
      }
  ],
  "outputs": [
      {
          "name": "outfile",
          "url": "/path/to/output_file",
          "path": "/container/output"
      }
  ],
  "resources": {
      "cpuCores": 1,
      "ramGb": 1,
      "diskGb": 100,
      "preemptible": false
  },
  "executors": [
      {
          "image": "ubuntu",
          "command": [
              "md5sum",
              "/container/input"
          ],
          "stdout": "/container/output",
          "stderr": "/container/stderr",
          "workdir": "/tmp"
      }
  ]
}

A minimal version of the same task, including only the required fields looks like:

{
    "inputs": [
      {
        "url":  "/path/to/input_file",
        "path": "/container/input"
      }
    ],
    "outputs" : [
      {
        "url" :  "/path/to/output_file",
        "path" : "/container/output"
      }
    ],
    "executors" : [
      {
        "image" : "ubuntu",
        "command" : ["md5sum", "/container/input"],
        "stdout" : "/container/output"
      }
    ]
}

To create the task, send an HTTP POST request to the /tasks endpoint:

POST /ga4gh/tes/v1/tasks

The response indicates an identifier for the created task resource:

{ 
  "id": "task-1234" 
}

Fetching a task

To get a task by its identifier, send an HTTP GET request to the /tasks/{id} endpoint:

GET /ga4gh/tes/v1/tasks/task-1234

The default minimal response will include the task state:

{ 
  "id": "task-1234",
  "state": "RUNNING" 
}

To get more information, you can change the task view using the view URL query parameter.

The BASIC view will include all task fields except a few which might be large strings (stdout/stderr, system logging, input parameter contents):

GET /ga4gh/tes/v1/tasks/task-1234?view=BASIC
{
  "id": "task123",
  "name": "Sample Task",
  "description": "This is a sample task description.",
  "state": "COMPLETED",
  "inputs": [
      {
          "name": "infile",
          "description": "Input file for the task.",
          "url": "/path/to/input_file",
          "path": "/container/input",
          "type": "FILE"
      }
  ],
  "outputs": [
      {
          "name": "outfile",
          "url": "/path/to/output_file",
          "path": "/container/output"
      }
  ],
  "resources": {
      "cpuCores": 1,
      "ramGb": 2.0,
      "diskGb": 10.0,
      "preemptible": false
  },
  "executors": [
      {
          "image": "ubuntu:latest",
          "command": ["command", "arg1", "arg2"],
          "stdout": "/container/output",
          "stderr": "/container/stderr",
          "workdir": "/tmp"
      }
  ],
  "created": "2024-10-24T12:00:00Z",
  "updated": "2024-10-24T12:30:00Z"
}

The FULL view includes stdout/stderr, system logs and full input parameters:

GET /ga4gh/tes/v1/tasks/task-1234?view=FULL
{
  "id": "job-0012345",
  "state": "COMPLETE",
  "name": "MD5 Checksum Task",
  "description": "This task computes the MD5 checksum of the input file.",
  "inputs": [
    {
      "url": "s3://my-object-store/file1",
      "path": "/data/file1"
    }
  ],
  "outputs": [
    {
      "path": "/data/outfile",
      "url": "s3://my-object-store/outfile-1",
      "type": "FILE"
    }
  ],
  "resources": {
    "cpu_cores": 4,
    "preemptible": false,
    "ram_gb": 8,
    "disk_gb": 40,
    "zones": "us-west-1",
    "backend_parameters": {
      "VmSize": "Standard_D64_v3"
    },
    "backend_parameters_strict": false
  },
  "executors": [
    {
      "image": "ubuntu:20.04",
      "command": [
        "/bin/md5",
        "/data/file1"
      ],
      "workdir": "/data/",
      "stdin": "/data/file1",
      "stdout": "/tmp/stdout.log",
      "stderr": "/tmp/stderr.log",
      "ignore_error": true
    }
  ],
  "volumes": [
    "/vol/A/"
  ],
  "tags": {
    "WORKFLOW_ID": "cwl-01234",
    "PROJECT_GROUP": "alice-lab"
  },
  "logs": [
    {
      "logs": [
        {
          "start_time": "2020-10-02T10:00:00-05:00",
          "end_time": "2020-10-02T11:00:00-05:00",
          "stdout": "MD5 checksum calculation completed successfully.",
          "stderr": "",
          "exit_code": 0
        }
      ],
      "metadata": {
        "host": "worker-001",
        "slurmm_id": 123456
      },
      "start_time": "2020-10-02T10:00:00-05:00",
      "end_time": "2020-10-02T11:00:00-05:00",
      "outputs": [
        {
          "url": "s3://my-object-store/outfile-1",
          "path": "/data/outfile",
          "size_bytes": 1024
        }
      ],
      "system_logs": [
        "Task executed successfully without any issues."
      ]
    }
  ],
  "creation_time": "2020-10-02T10:00:00-05:00"
}

Listing tasks

To list all available tasks, send an HTTP GET requests to the /tasks endpoint:

GET /ga4gh/tes/v1/tasks
{
  "tasks": [
    {
      "id": "job-0012345",
      "state": "COMPLETE"
    },
    {
      "id": "job-0012346",
      "state": "RUNNING"
    },
    {
      "id": "job-0012347",
      "state": "FAILED"
    }
  ]
}

Similar to getting a task by ID, you may change the task view:

GET /ga4gh/tes/v1/tasks?view=BASIC

Cancelling a task

To cancel a task, send an HTTP POST request to the tasks/{id}:cancel endpoint:

POST /ga4gh/tes/v1/tasks/task-1234:cancel

How to Contribute Changes

Community Contributions and Spec Advancement

The advancement of the GA4GH Task Execution Service (TES) API relies on active community engagement and contributions. While submitting issues is an effective way to report bugs or foster discussions about existing or proposed features, it is important to note that these actions alone typically do not lead to modifications in the specification. The most effective method for implementing changes is through the submission of a pull request (PR).

For detailed guidance on how to contribute, please refer to the contributing documentation.

If a security issue is identified with the specification, please send an email to mailto:[email protected] detailing your concerns.

Governance

The development of the TES specification is entirely community driven. However, development is overseen by a governance committee. For more information please refer to the governance documentation.