Skip to content

Commit

Permalink
add docs about s3 endpoint
Browse files Browse the repository at this point in the history
  • Loading branch information
paulineribeyre committed Dec 12, 2024
1 parent 56994a8 commit 303c570
Show file tree
Hide file tree
Showing 5 changed files with 115 additions and 5 deletions.
20 changes: 19 additions & 1 deletion .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,24 @@
"line_number": 64
}
],
"docs/local_installation.md": [
{
"type": "Secret Keyword",
"filename": "docs/local_installation.md",
"hashed_secret": "08d2e98e6754af941484848930ccbaddfefe13d6",
"is_verified": false,
"line_number": 94
}
],
"docs/s3.md": [
{
"type": "Secret Keyword",
"filename": "docs/s3.md",
"hashed_secret": "08d2e98e6754af941484848930ccbaddfefe13d6",
"is_verified": false,
"line_number": 56
}
],
"gen3workflow/config-default.yaml": [
{
"type": "Secret Keyword",
Expand Down Expand Up @@ -191,5 +209,5 @@
}
]
},
"generated_at": "2024-12-09T23:30:01Z"
"generated_at": "2024-12-12T23:42:54Z"
}
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ The documentation can be browsed in the [docs](docs) folder, and key documents a
* [Detailed API Documentation](http://petstore.swagger.io/?url=https://raw.githubusercontent.com/uc-cdis/gen3-workflow/master/docs/openapi.yaml)
* [Local installation](docs/local_installation.md)
* [Authorization](docs/authorization.md)
* [S3 interaction](docs/s3.md)
27 changes: 23 additions & 4 deletions docs/local_installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,8 @@ Try out the API at <http://localhost:8080/_status> or <http://localhost:8080/doc

## Run Nextflow workflows with Gen3Workflow

Example Nextflow configuration:
- Hit the `/storage/info` endpoint to get your working directory
- Configure Nextflow. Example Nextflow configuration:
```
plugins {
id 'nf-ga4gh'
Expand All @@ -85,15 +86,33 @@ process {
container = 'quay.io/nextflow/bash'
}
tes {
endpoint = 'http://localhost:8080/ga4gh/tes'
endpoint = '<Gen3Workflow URL>/ga4gh/tes'
oauthToken = "${GEN3_TOKEN}"
}
aws {
accessKey = "${GEN3_TOKEN}"
secretKey = 'N/A'
region = 'us-east-1'
client {
s3PathStyleAccess = true
endpoint = '<Gen3Workflow URL>/s3'
}
}
workDir = '<your working directory>'
```
> `http://localhost:8080` is where Gen3Workflow runs by default when started with `python run.py`.
> The Gen3Workflow URL should be set to `http://localhost:8080` in this case; this is where the service runs by default when started with `python run.py`.
- Run a workflow:

Run a workflow:
When setting your token manually:
```
export GEN3_TOKEN=<your token>
nextflow run hello
```
Or, with the [Gen3 Python SDK](https://github.com/uc-cdis/gen3sdk-python) configured with an API key:
```
gen3 run nextflow run hello
```

## AWS access

Expand Down
72 changes: 72 additions & 0 deletions docs/s3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# S3 interaction

Note: This discussion can apply to many use cases, but it is written with a specific use case in mind: using the Gen3Workflow service to run Nextflow workflows.

Contents:
- [Using IAM keys](#using-iam-keys)
- [Using a custom S3 endpoint](#using-a-custom-s3-endpoint)
- [Diagram](#diagram)

## Using IAM keys

We initially considered generating IAM keys for users to upload their input files to S3, retrieve their output files and store Nextflow intermediary files. Users would configure Nextflow with the generated IAM key ID and secret:

```
plugins {
id 'nf-ga4gh'
}
process {
executor = 'tes'
container = 'quay.io/nextflow/bash'
}
tes {
endpoint = '<Gen3Workflow URL>/ga4gh/tes'
oauthToken = "${GEN3_TOKEN}"
}
aws {
accessKey = "${AWS_KEY_ID}"
secretKey = "${AWS_KEY_SECRET}"
region = 'us-east-1'
}
workDir = '<your working directory>'
```

Plain-text AWS IAM keys in users' hands causes security concerns. It creates a difficult path for auditing and traceability. The ability to easily see the secrets in plain-text is also a concern.

## Using a custom S3 endpoint

The `/s3` endpoint was implemented to avoid using IAM keys. This endpoint receives S3 requests, re-signs them with internal credentials, and forwards them to AWS S3. Users provide their Gen3 token as the “access key ID”, which is used to verify they have the appropriate access. This key is then overwritten with internal credentials that actually have access to AWS S3.

Nextflow supports S3-compatible storage through the `aws.client.s3PathStyleAccess` and `aws.client.endpoint` settings, this allows users to point Nextflow to our custom S3 API:

```
plugins {
id 'nf-ga4gh'
}
process {
executor = 'tes'
container = 'quay.io/nextflow/bash'
}
tes {
endpoint = '<Gen3Workflow URL>/ga4gh/tes'
oauthToken = "${GEN3_TOKEN}"
}
aws {
accessKey = "${GEN3_TOKEN}"
secretKey = 'N/A'
region = 'us-east-1'
client {
s3PathStyleAccess = true
endpoint = '<Gen3Workflow URL>/s3'
}
}
workDir = '<your working directory>'
```

Notes:
- We have to set the Gen3 token as the “key ID”, not the “key secret”, in order to extract it from the request. The “key secret” is hashed and cannot be extracted.
- When an `aws.accessKey` value is provided, the Nextflow configuration requires the `aws.secretKey` value to be provided as well. Users can set it to something like "N/A".

## Diagram

![s3 interaction diagram](s3.png)
Binary file added docs/s3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 303c570

Please sign in to comment.