You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.
Bug description
I am deploying ESMFold on the open science pool, and there are some sets of FASTA files that seem to always crash, they also appear to crash when run locally in a docker container. This doesn't appear to be a resource issue, but it can be difficult to tell from the Condor logs sometimes.
Reproduction steps
Running ESMFold out of a docker container, using a command that generically looks like:
Expected behavior
This command completes cleanly for some input files, but not others. It seems to uniformly fail with the error pasted below when running interactively. I've attached two fasta files, one for which the command runs cleanly, and one for which it fails.
Logs
Failure output, for an interactive job:
(base) root@6843f707dd31:/# conda run -n py39-esmfold esm-fold -i UserData/id0001partners00002.fa -o . -m ESModels --cpu-only
24/07/26 14:46:28 | INFO | root | Reading sequences from UserData/id0001partners00002.fa
24/07/26 14:46:28 | INFO | root | Loaded 2 sequences from UserData/id0001partners00002.fa
24/07/26 14:46:28 | INFO | root | Loading model
24/07/26 14:48:03 | INFO | root | Starting Predictions
ERROR conda.cli.main_run:execute(125): conda run esm-fold -i UserData/id0001partners00002.fa -o . -m ESModels --cpu-only failed. (See above for error)
Success output, for an interactive job:
(base) root@6843f707dd31:/# conda run -n py39-esmfold esm-fold -i UserData/id0001partners00011.fa -o . -m ESModels --cpu-only
24/07/26 15:14:04 | INFO | root | Reading sequences from UserData/id0001partners00011.fa
24/07/26 15:14:04 | INFO | root | Loaded 2 sequences from UserData/id0001partners00011.fa
24/07/26 15:14:04 | INFO | root | Loading model
24/07/26 15:16:31 | INFO | root | Starting Predictions
24/07/26 15:30:19 | INFO | root | Predicted structure for 1_1_3688 with length 335, pLDDT 91.6, pTM 0.726 in 414.3s (amortized, batch size 2). 1 / 2 completed.
24/07/26 15:30:19 | INFO | root | Predicted structure for 2_1_23 with length 335, pLDDT 91.8, pTM 0.719 in 414.3s (amortized, batch size 2). 2 / 2 completed.
Output goes here
Additional context
Technically when these jobs are running on the OSPool they're running out of singularity containers, as opposed to docker containers, though I don't know how much that matters. I get different kill codes on the OSPool, though that could be a site specific thing, i.e, when i interrogate my logs from Condor for jobs that Condor believes did not go over memory, I get:
EDIT
One additional extra piece of context is that when these jobs complete successfully, CPU usage is near the max possible for the requested resource. When they fail like this, CPU usage is nearly minimal.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Bug description
I am deploying ESMFold on the open science pool, and there are some sets of FASTA files that seem to always crash, they also appear to crash when run locally in a docker container. This doesn't appear to be a resource issue, but it can be difficult to tell from the Condor logs sometimes.
Reproduction steps
Running ESMFold out of a docker container, using a command that generically looks like:
conda run -n py39-esmfold esm-fold -i <seqs.fa> -o <you/can/send/this/wherever> -m <some/mounted/volume> --cpu-only > result.txt
Expected behavior
This command completes cleanly for some input files, but not others. It seems to uniformly fail with the error pasted below when running interactively. I've attached two fasta files, one for which the command runs cleanly, and one for which it fails.
Logs
Failure output, for an interactive job:
(base) root@6843f707dd31:/# conda run -n py39-esmfold esm-fold -i UserData/id0001partners00002.fa -o . -m ESModels --cpu-only
24/07/26 14:46:28 | INFO | root | Reading sequences from UserData/id0001partners00002.fa
24/07/26 14:46:28 | INFO | root | Loaded 2 sequences from UserData/id0001partners00002.fa
24/07/26 14:46:28 | INFO | root | Loading model
24/07/26 14:48:03 | INFO | root | Starting Predictions
/tmp/tmplazwrhgo: line 3: 39 Killed esm-fold -i UserData/id0001partners00002.fa -o . -m ESModels --cpu-only
ERROR conda.cli.main_run:execute(125):
conda run esm-fold -i UserData/id0001partners00002.fa -o . -m ESModels --cpu-only
failed. (See above for error)Success output, for an interactive job:
(base) root@6843f707dd31:/# conda run -n py39-esmfold esm-fold -i UserData/id0001partners00011.fa -o . -m ESModels --cpu-only
24/07/26 15:14:04 | INFO | root | Reading sequences from UserData/id0001partners00011.fa
24/07/26 15:14:04 | INFO | root | Loaded 2 sequences from UserData/id0001partners00011.fa
24/07/26 15:14:04 | INFO | root | Loading model
24/07/26 15:16:31 | INFO | root | Starting Predictions
24/07/26 15:30:19 | INFO | root | Predicted structure for 1_1_3688 with length 335, pLDDT 91.6, pTM 0.726 in 414.3s (amortized, batch size 2). 1 / 2 completed.
24/07/26 15:30:19 | INFO | root | Predicted structure for 2_1_23 with length 335, pLDDT 91.8, pTM 0.719 in 414.3s (amortized, batch size 2). 2 / 2 completed.
Additional context
Technically when these jobs are running on the OSPool they're running out of singularity containers, as opposed to docker containers, though I don't know how much that matters. I get different kill codes on the OSPool, though that could be a site specific thing, i.e, when i interrogate my logs from Condor for jobs that Condor believes did not go over memory, I get:
$ cat LogFilesCB/out.2.err
/srv/tmpgo8p_04o: line 3: 34 Killed esm-fold -i id0001partners00002.fa -o structs -m ESModels --cpu-only
ERROR conda.cli.main_run:execute(125):
conda run esm-fold -i id0001partners00002.fa -o structs -m ESModels --cpu-only
failed. (See above for error)$ cat LogFilesCB/out.137.err
/srv/tmpdklsx5w4: line 3: 34 Bus error (core dumped) esm-fold -i id0001partners00137.fa -o structs -m ESModels --cpu-only
ERROR conda.cli.main_run:execute(125):
conda run esm-fold -i id0001partners00137.fa -o structs -m ESModels --cpu-only
failed. (See above for error)/srv//Run.sh: line 35: 24 Bus error (core dumped) conda run -n py39-esmfold esm-fold -i "$TARGET" -o structs -m ESModels --cpu-only > "$INFILE1"
$ cat LogFilesCB/out.173.err
/srv/tmp0pg3lc07: line 3: 36 Killed esm-fold -i id0001partners00173.fa -o structs -m ESModels --cpu-only
ERROR conda.cli.main_run:execute(125):
conda run esm-fold -i id0001partners00173.fa -o structs -m ESModels --cpu-only
failed. (See above for error)The '00002' and '00011' FASTA files have been attached as 'txt' files because of file extension restrictions.
id0001partners00002.txt
id0001partners00011.txt
EDIT
One additional extra piece of context is that when these jobs complete successfully, CPU usage is near the max possible for the requested resource. When they fail like this, CPU usage is nearly minimal.
The text was updated successfully, but these errors were encountered: