Time consuming process #7

karimi81 · 2021-09-29T18:57:45Z

Hi There,
I am trying to use TRF to detect tandem repeats in a large genome assembly with the size of 2.68 Gb. Although the program worked for about 12 days on a node with 32 CPUs and 125 Gb memory, finally it was not completed. The following is the command I have used:
trf new_id.fasta 2 5 7 80 10 50 2000
Is there any way to improve the efficiency of the computation? e.g parallel processing or reduce the computation time . I would be appreciated if you could help me in this regard.
Thank you

Aannaw · 2021-10-05T03:42:40Z

Hi
Have you solved this problem? I split my genome with the size 3.1G and then running trf, but it is not yet completed after about 14 days runnng with 1 cpu and 3 Gb. Does the programme have any options with thread to speed?

xiekunwhy · 2022-07-05T01:32:59Z

I meet the same problem, and I think I need to give up trf, and try some others like Look4TRs and Dot2dot.

Wenfei-Xian · 2023-06-11T17:11:06Z

Hi all, TRF will get stuck in the long centromere region, if you want to identify tandem repeat, especially in T2T assembly, please set a higher value for -l :)

hdashnow · 2024-01-10T20:29:20Z

Increasing the value of -l to >100 for chm13-T2T helped in my case. I tested a few different values to check their memory usage as it can get pretty high.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time consuming process #7

Time consuming process #7

karimi81 commented Sep 29, 2021

Aannaw commented Oct 5, 2021

xiekunwhy commented Jul 5, 2022

Wenfei-Xian commented Jun 11, 2023

hdashnow commented Jan 10, 2024

Time consuming process #7

Time consuming process #7

Comments

karimi81 commented Sep 29, 2021

Aannaw commented Oct 5, 2021

xiekunwhy commented Jul 5, 2022

Wenfei-Xian commented Jun 11, 2023

hdashnow commented Jan 10, 2024