-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: create TES task as k8s job #200
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis pull request introduces the ability to create TES tasks as Kubernetes jobs. It includes new configurations, utility functions, and classes for managing Kubernetes interactions. The changes are primarily focused on integrating Kubernetes job creation and management into the existing TES task workflow. File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @JaeAeich - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 7 issues found
- 🟡 Security: 1 issue found
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
The CI will break, those are easy fixes, please ignore them and review the rest of the code and logic :) |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #200 +/- ##
=======================================
Coverage 98.21% 98.21%
=======================================
Files 8 8
Lines 561 561
=======================================
Hits 551 551
Misses 10 10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
┌─────────────────────────────────────────────────────────┐ | ||
│ Kubernetes │ | ||
│ │ | ||
│ ┌────────────────────────────┐ ┌───────────────────┐ │ | ||
│ │ Secret: ftp-secret │ │ ConfigMap/PVC │ │ | ||
│ │ - username │ │ - JSON_INPUT.gz │ │ | ||
│ │ - password │ │ │ │ | ||
│ └──────────▲─────────────────┘ └───────▲───────────┘ │ | ||
│ │ | │ | ||
│ │ | │ | ||
│ │ | │ | ||
│ ┌─────────┴────────────────────────────┴────────────┐ │ | ||
│ │ Job: taskmaster │ │ | ||
│ │ ┌───────────────────────────────────────────────┐ │ │ | ||
│ │ │ Pod: taskmaster │ │ │ | ||
│ │ │ - Container: taskmaster │ │ │ | ||
│ │ │ - Env: TESK_FTP_USERNAME │ │ │ | ||
│ │ │ - Env: TESK_FTP_PASSWORD │ │ │ | ||
│ │ │ - Args: -f /jsoninput/JSON_INPUT.gz │ │ │ | ||
│ │ │ - Mounts: /podinfo │ │ │ | ||
│ │ │ /jsoninput │ │ │ | ||
│ │ └───────────────────────────────────────────────┘ │ │ | ||
│ └───────────────────────────────────────────────────┘ │ | ||
│ │ | ||
└─────────────────────────────────────────────────────────┘ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just trying to understand k8s flow, please ignore, maybe will remove later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle it's nice to have such a schema in the docs.
I am, however, wondering about the FTP secret. Maybe this will become clearer to me after looking at the code below, but seeing it represented like this in the doc, I have the concern that FTP is somehow treated special, when, ideally, storage solutions should all be treated in an abstract manner, like previously discussed: abstract storage handler and individual implementations for different storage/file transfer solutions. And that should probably extend to managing secrets/credentials as well, no?
@@ -9,7 +9,7 @@ host_name: "" | |||
# | |||
|
|||
# 'openstack' or 's3' | |||
storage: none | |||
storage: s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: turn back to none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be described in more detail somewhere. Also, if none
is an allowed value, this should be listed, and not just openstack
and s3
.
┌─────────────────────────────────────────────────────────┐ | ||
│ Kubernetes │ | ||
│ │ | ||
│ ┌────────────────────────────┐ ┌───────────────────┐ │ | ||
│ │ Secret: ftp-secret │ │ ConfigMap/PVC │ │ | ||
│ │ - username │ │ - JSON_INPUT.gz │ │ | ||
│ │ - password │ │ │ │ | ||
│ └──────────▲─────────────────┘ └───────▲───────────┘ │ | ||
│ │ | │ | ||
│ │ | │ | ||
│ │ | │ | ||
│ ┌─────────┴────────────────────────────┴────────────┐ │ | ||
│ │ Job: taskmaster │ │ | ||
│ │ ┌───────────────────────────────────────────────┐ │ │ | ||
│ │ │ Pod: taskmaster │ │ │ | ||
│ │ │ - Container: taskmaster │ │ │ | ||
│ │ │ - Env: TESK_FTP_USERNAME │ │ │ | ||
│ │ │ - Env: TESK_FTP_PASSWORD │ │ │ | ||
│ │ │ - Args: -f /jsoninput/JSON_INPUT.gz │ │ │ | ||
│ │ │ - Mounts: /podinfo │ │ │ | ||
│ │ │ /jsoninput │ │ │ | ||
│ │ └───────────────────────────────────────────────┘ │ │ | ||
│ └───────────────────────────────────────────────────┘ │ | ||
│ │ | ||
└─────────────────────────────────────────────────────────┘ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -9,7 +9,7 @@ host_name: "" | |||
# | |||
|
|||
# 'openstack' or 's3' | |||
storage: none | |||
storage: s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
tesk/api/ga4gh/tes/controllers.py
Outdated
logger.error("Nothing received in request body.") | ||
raise BadRequest("No request body received.") | ||
tes_task = TesTask(**request_body) | ||
namespace = "tesk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just seen that this value is hardwired to tesk
. In the java implementation, the TESK API is running in the same namespace that it will create the jobs, so you can deploy it in a shared cluster and call the namespace whatever you want.
I am not sure what others think, but this will be an issue for CSC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't be an issue I plan on removing it from here, I'll add this to docs config or as env variable so when launching the api anyone can configure it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't review the whole PR, but here are some comments for now
┌─────────────────────────────────────────────────────────┐ | ||
│ Kubernetes │ | ||
│ │ | ||
│ ┌────────────────────────────┐ ┌───────────────────┐ │ | ||
│ │ Secret: ftp-secret │ │ ConfigMap/PVC │ │ | ||
│ │ - username │ │ - JSON_INPUT.gz │ │ | ||
│ │ - password │ │ │ │ | ||
│ └──────────▲─────────────────┘ └───────▲───────────┘ │ | ||
│ │ | │ | ||
│ │ | │ | ||
│ │ | │ | ||
│ ┌─────────┴────────────────────────────┴────────────┐ │ | ||
│ │ Job: taskmaster │ │ | ||
│ │ ┌───────────────────────────────────────────────┐ │ │ | ||
│ │ │ Pod: taskmaster │ │ │ | ||
│ │ │ - Container: taskmaster │ │ │ | ||
│ │ │ - Env: TESK_FTP_USERNAME │ │ │ | ||
│ │ │ - Env: TESK_FTP_PASSWORD │ │ │ | ||
│ │ │ - Args: -f /jsoninput/JSON_INPUT.gz │ │ │ | ||
│ │ │ - Mounts: /podinfo │ │ │ | ||
│ │ │ /jsoninput │ │ │ | ||
│ │ └───────────────────────────────────────────────┘ │ │ | ||
│ └───────────────────────────────────────────────────┘ │ | ||
│ │ | ||
└─────────────────────────────────────────────────────────┘ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle it's nice to have such a schema in the docs.
I am, however, wondering about the FTP secret. Maybe this will become clearer to me after looking at the code below, but seeing it represented like this in the doc, I have the concern that FTP is somehow treated special, when, ideally, storage solutions should all be treated in an abstract manner, like previously discussed: abstract storage handler and individual implementations for different storage/file transfer solutions. And that should probably extend to managing secrets/credentials as well, no?
compressed_data
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this used for? Consider giving it a more expressive (and less ambiguous - compressed could indicate sth like ZIP or GZ) name. Also consider giving it a file extension (.json
, .txt
?). Finally, consider arranging test data in a separate folder data/
or tests/data/
.
@@ -9,7 +9,7 @@ host_name: "" | |||
# | |||
|
|||
# 'openstack' or 's3' | |||
storage: none | |||
storage: s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be described in more detail somewhere. Also, if none
is an allowed value, this should be listed, and not just openstack
and s3
.
encoded_data.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, consider moving to dedicated data folder.
Also, consider using a file naming pattern that, when read left to right, goes from more generic to more specific, e.g., data_encoded.txt
and data_compressed.txt
instead of encoded_data.txt
and compressed_data.txt
. This helps keeping related data together when sorted alphanumerically.
Even better to add something more expressive to explain the use case of these files, i.e., data_USE_CASE_encoded.txt
and data_USE_CASE_compressed.txt
.
@@ -4,9 +4,10 @@ requires = ["poetry-core"] | |||
|
|||
[tool.bandit] | |||
skips = [ | |||
"B101", # Use of asserts is discouraged as it is removed during byte generation, we don't need that! | |||
"B108", # Insecure usage of temp file/directory, false positive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But are you sure that this will always be false positives?
If not, better to disable per line/block/module
tesk/api/ga4gh/tes/controllers.py
Outdated
@@ -26,14 +28,22 @@ def CancelTask(id, *args, **kwargs) -> dict: # type: ignore | |||
|
|||
# POST /tasks | |||
@log_traffic | |||
def CreateTask(*args, **kwargs) -> dict: # type: ignore | |||
def CreateTask(**kwargs) -> dict[str, str]: # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the type: ignore
here after implementation?
tesk/api/ga4gh/tes/controllers.py
Outdated
try: | ||
request_body = kwargs.get("body") | ||
if request_body is None: | ||
logger.error("Nothing received in request body.") | ||
raise BadRequest("No request body received.") | ||
tes_task = TesTask(**request_body) | ||
namespace = "tesk" | ||
return CreateTesTask(tes_task, namespace).response() | ||
except Exception as e: | ||
raise InternalServerError from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error handling looks wrong/weird to me:
- The generic catching of
Exception
means that every exception will raise a500
error, no? So there is no chance of the route ever raising a400
,401
,403
etc. error. Surely this is not what we want. - FOCA already handles errors in application context appropriately. Specifically, all errors not listed in the
exceptions
dictionary are raised as500
errors (see here), so this code is redundant.
tesk/api/ga4gh/tes/controllers.py
Outdated
if request_body is None: | ||
logger.error("Nothing received in request body.") | ||
raise BadRequest("No request body received.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connexion should handle BadRequest/400
errors automatically, unless that functionality is specifically disabled in the config (which it isn't). So no need for this code, I believe (can be easily tested).
tesk/api/ga4gh/tes/controllers.py
Outdated
namespace = "tesk" | ||
return CreateTesTask(tes_task, namespace).response() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except when calling well known builtin functions, I think it improves readibility a lot to name params when passing args. It also helps prevent bugs that may arise from using positional args. Also, in this case, it avoids the need to define a variable just to be clear about what you are passing.
So consider using:
return CreateTesTask(task=tes_task, namespace="tesk").response()
PS: This is WIP, consider this PR for POC, this maybe/will be further broken down into smaller PR once a concensus on code is reached and everthing starts to work.
Summary by Sourcery
This pull request introduces the ability to create TES tasks as Kubernetes jobs. It includes a new Kubernetes client wrapper, a converter for TES tasks to Kubernetes jobs, and updates to the deployment configuration to support these new features.