diff --git a/.secrets.baseline b/.secrets.baseline index 63e8630..5548e9d 100644 --- a/.secrets.baseline +++ b/.secrets.baseline @@ -145,6 +145,24 @@ "line_number": 64 } ], + "docs/local_installation.md": [ + { + "type": "Secret Keyword", + "filename": "docs/local_installation.md", + "hashed_secret": "08d2e98e6754af941484848930ccbaddfefe13d6", + "is_verified": false, + "line_number": 94 + } + ], + "docs/s3.md": [ + { + "type": "Secret Keyword", + "filename": "docs/s3.md", + "hashed_secret": "08d2e98e6754af941484848930ccbaddfefe13d6", + "is_verified": false, + "line_number": 56 + } + ], "gen3workflow/config-default.yaml": [ { "type": "Secret Keyword", @@ -191,5 +209,5 @@ } ] }, - "generated_at": "2024-12-09T23:30:01Z" + "generated_at": "2024-12-12T23:42:54Z" } diff --git a/README.md b/README.md index 8b48b55..8eab625 100644 --- a/README.md +++ b/README.md @@ -13,3 +13,4 @@ The documentation can be browsed in the [docs](docs) folder, and key documents a * [Detailed API Documentation](http://petstore.swagger.io/?url=https://raw.githubusercontent.com/uc-cdis/gen3-workflow/master/docs/openapi.yaml) * [Local installation](docs/local_installation.md) * [Authorization](docs/authorization.md) +* [S3 interaction](docs/s3.md) diff --git a/docs/local_installation.md b/docs/local_installation.md index cbf69b7..e95b622 100644 --- a/docs/local_installation.md +++ b/docs/local_installation.md @@ -75,7 +75,8 @@ Try out the API at or `http://localhost:8080` is where Gen3Workflow runs by default when started with `python run.py`. +> The Gen3Workflow URL should be set to `http://localhost:8080` in this case; this is where the service runs by default when started with `python run.py`. + +- Run a workflow: -Run a workflow: +When setting your token manually: ``` +export GEN3_TOKEN= nextflow run hello ``` +Or, with the [Gen3 Python SDK](https://github.com/uc-cdis/gen3sdk-python) configured with an API key: +``` +gen3 run nextflow run hello +``` ## AWS access diff --git a/docs/s3.md b/docs/s3.md new file mode 100644 index 0000000..c15ba4f --- /dev/null +++ b/docs/s3.md @@ -0,0 +1,72 @@ +# S3 interaction + +Note: This discussion can apply to many use cases, but it is written with a specific use case in mind: using the Gen3Workflow service to run Nextflow workflows. + +Contents: +- [Using IAM keys](#using-iam-keys) +- [Using a custom S3 endpoint](#using-a-custom-s3-endpoint) +- [Diagram](#diagram) + +## Using IAM keys + +We initially considered generating IAM keys for users to upload their input files to S3, retrieve their output files and store Nextflow intermediary files. Users would configure Nextflow with the generated IAM key ID and secret: + +``` +plugins { + id 'nf-ga4gh' +} +process { + executor = 'tes' + container = 'quay.io/nextflow/bash' +} +tes { + endpoint = '/ga4gh/tes' + oauthToken = "${GEN3_TOKEN}" +} +aws { + accessKey = "${AWS_KEY_ID}" + secretKey = "${AWS_KEY_SECRET}" + region = 'us-east-1' +} +workDir = '' +``` + +Plain-text AWS IAM keys in users' hands causes security concerns. It creates a difficult path for auditing and traceability. The ability to easily see the secrets in plain-text is also a concern. + +## Using a custom S3 endpoint + +The `/s3` endpoint was implemented to avoid using IAM keys. This endpoint receives S3 requests, re-signs them with internal credentials, and forwards them to AWS S3. Users provide their Gen3 token as the “access key ID”, which is used to verify they have the appropriate access. This key is then overwritten with internal credentials that actually have access to AWS S3. + +Nextflow supports S3-compatible storage through the `aws.client.s3PathStyleAccess` and `aws.client.endpoint` settings, this allows users to point Nextflow to our custom S3 API: + +``` +plugins { + id 'nf-ga4gh' +} +process { + executor = 'tes' + container = 'quay.io/nextflow/bash' +} +tes { + endpoint = '/ga4gh/tes' + oauthToken = "${GEN3_TOKEN}" +} +aws { + accessKey = "${GEN3_TOKEN}" + secretKey = 'N/A' + region = 'us-east-1' + client { + s3PathStyleAccess = true + endpoint = '/s3' + } +} +workDir = '' +``` + +Notes: +- We have to set the Gen3 token as the “key ID”, not the “key secret”, in order to extract it from the request. The “key secret” is hashed and cannot be extracted. +- When an `aws.accessKey` value is provided, the Nextflow configuration requires the `aws.secretKey` value to be provided as well. Users can set it to something like "N/A". + +## Diagram + +![s3 interaction diagram](s3.png) diff --git a/docs/s3.png b/docs/s3.png new file mode 100644 index 0000000..6627900 Binary files /dev/null and b/docs/s3.png differ