Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mount PVC with reference data to executor #130

Open
uniqueg opened this issue Nov 6, 2021 · 5 comments
Open

Mount PVC with reference data to executor #130

uniqueg opened this issue Nov 6, 2021 · 5 comments

Comments

@uniqueg
Copy link
Member

uniqueg commented Nov 6, 2021

Allow a PVC on which reference data is saved, to be mounted on the executor pod for easy/fast data access.

@uniqueg
Copy link
Member Author

uniqueg commented Nov 6, 2021

Requested by the Greek ELIXIR node, see here: https://docs.google.com/spreadsheets/d/1vBFhBQ-nFqhSL5dLjQfOWO6x9BzmV9x6l18p9GYRZdQ/edit#gid=0

Contacts: @zagganas & @vergoulis

@uniqueg
Copy link
Member Author

uniqueg commented Nov 6, 2021

While this sounds like a useful feature to increase the performance of TESK in some use cases, I think it goes beyond of what TES tries to be, a thin API layer to execute atomic, containerized tasks in any compute backend. I therefore put this issue in the TESK repository, as I could imagine that this could possibly be provided in an implementation-specific manner that does not break the TES/DRS pattern of gaining access to data envisioned by the GA4GH Cloud WS & FASP.

While I lack the technical k8s knowledge to devise a detailed design strategy, I could imagine that one could optionally co-deploy a DRS API service with TESK that gives access to data stored on one or more PVCs mounted in the executor pods of TES tasks. Deployments making use of this setup could then access data on those PVCs without having to rely on network traffic via DRS (even if this means - not sure - that those data are not accessible outside of TESK executor pods). I'm not at all sure if this is feasible, so let's discuss :)

@lvarin
Copy link
Contributor

lvarin commented Nov 8, 2021

So there are two ideas on the table:

  • Define a PVC that will always be mounted on every task pod. The PVC will have reference/public data and will persists meanwhile the TESK-api pod is running. This will be in addition to the PVC that is created for tasks. I wonder how data will be copied to the PVC? by hand (kubectl cp).
  • Add the option to deploy a DRS in the same namespace as TESK, that also shares storage with the executors? As I can see DRS does not currently have any storage. So maybe there is something I am not understanding about this solution.

@noooonee
Copy link

Same issue here, seems like there are multiple repos need to be updated:

broadinstitute/cromwell#2190

And the TES api definition:

https://github.com/ga4gh/task-execution-schemas

@uniqueg
Copy link
Member Author

uniqueg commented Aug 18, 2022

Thanks for bumping this, @hex43ver.

I have opened ga4gh/task-execution-schemas#186 to discuss this on a wider scale. Perhaps you want to add your own opinions and use case? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants