You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some todos for wlm-operator, including but not limited to
Develop a robust agent for forwarding the red-box socket. It may retry under network interruptions.
Make configurator more robust under the forwarding interruptions of socket.
Wlm-operator is able to get logs of slurm jobs, while Argo's resource template only outputs something like
time="2022-07-13T02:39:55.042Z" level=info msg="Get slurmjobs 200"
time="2022-07-13T02:39:55.043Z" level=info msg="failure condition '{status.status == [Failed]}' evaluated false"
time="2022-07-13T02:39:55.043Z" level=info msg="success condition '{status.status == [Succeeded]}' evaluated false"
time="2022-07-13T02:39:55.044Z" level=info msg="0/1 success conditions matched"
time="2022-07-13T02:39:55.045Z" level=info msg="Waiting for resource slurmjob.wlm.sylabs.io/wlm-rhhbc-hello-dphos-hello-slurm-run-42
03105651 in namespace argo resulted in retryable error: Neither success condition nor the failure condition has been matched. Retryi
ng..."
Wlm-operator may provide a log persistence on the local side.
To avoid modification of Argo, dflow use 3 steps to complete a wlm template, including a prepare step, a run step and a collect step. The prepare step copies inputs artifacts from the container to some host path. The run step mounts the host directory and apply the wlm resource which uploads the input files to the remote cluster, and submit a slurm job, finally downloads output files to a mounted host directory. The collect step copies the output artifacts from the host to the container for Argo collecting. Is simplification of the procedure possible?
The text was updated successfully, but these errors were encountered:
njzjz
pushed a commit
to njzjz/dflow
that referenced
this issue
Nov 28, 2022
There are some todos for wlm-operator, including but not limited to
Wlm-operator may provide a log persistence on the local side.
The text was updated successfully, but these errors were encountered: