Abnormally high score appears randomly in parallel fstime test while running large amount of benchmark copies with HDDs #103

Zengkai163 · 2024-11-30T14:26:59Z

Hi Glenn Strauss,
We found some abnormally high scores that appeared randomly in parallel fstime tests while executing large amount of copies(for example, more than two hundred) of the benchmark with hard disk drives.
abnormally high score:

normal score:

After some debugging, I found that the dual sync operations in function c_test() would cause some fstime processes block waiting for a global rwsem lock for a long time(maybe larger than 120s), with the commit 81e9de5 ("fstime.c - Seperate r/w files for each parallel (#85)")" merged.
`
/*

Run the copy test for the time given in seconds.
*/
int c_test(int timeSecs)
{
unsigned long counted = 0L;
unsigned long tmp;
double start, end;
extern int sigalarm;
```
 sync(); // the first sync operation
 sleep(2);
 sync(); // the second sync operation
 sleep(1);

 /* rewind */
 errno = 0;
 lseek(f, 0L, 0);
```

......
}
Here are some related dmesg logs and unixbench result logs:
dmesg logs(partial):
[Sat Nov 30 17:33:33 2024] INFO: task fstime:15977 blocked for more than 120 seconds.
[Sat Nov 30 17:33:33 2024] Not tainted 6.12.0
[Sat Nov 30 17:33:33 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Nov 30 17:33:33 2024] task:fstime state:D stack:0 pid:15977 tgid:15977 ppid:15967 flags:0x00000204
[Sat Nov 30 17:33:33 2024] Call trace:
[Sat Nov 30 17:33:33 2024] __switch_to+0xf4/0x160
[Sat Nov 30 17:33:33 2024] __schedule+0x2a0/0x838
[Sat Nov 30 17:33:33 2024] schedule+0x30/0xc8
[Sat Nov 30 17:33:33 2024] schedule_preempt_disabled+0x18/0x30
[Sat Nov 30 17:33:33 2024] rwsem_down_write_slowpath+0x33c/0x820
[Sat Nov 30 17:33:33 2024] down_write+0x60/0x78
[Sat Nov 30 17:33:33 2024] sync_inodes_sb+0xa0/0x110
[Sat Nov 30 17:33:33 2024] sync_inodes_one_sb+0x24/0x38
[Sat Nov 30 17:33:33 2024] iterate_supers+0xb4/0x1f8
[Sat Nov 30 17:33:33 2024] ksys_sync+0x54/0xc8
[Sat Nov 30 17:33:33 2024] __arm64_sys_sync+0x18/0x30
[Sat Nov 30 17:33:33 2024] invoke_syscall+0x50/0x120
[Sat Nov 30 17:33:33 2024] el0_svc_common.constprop.0+0xc8/0xf0
[Sat Nov 30 17:33:33 2024] do_el0_svc+0x24/0x38
[Sat Nov 30 17:33:33 2024] el0_svc+0x34/0x128
[Sat Nov 30 17:33:33 2024] el0t_64_sync_handler+0x100/0x130
[Sat Nov 30 17:33:33 2024] el0t_64_sync+0x188/0x190
[Sat Nov 30 17:33:33 2024] INFO: task fstime:16102 blocked for more than 120 seconds.
[Sat Nov 30 17:33:33 2024] Not tainted 6.12.0
[Sat Nov 30 17:33:33 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Nov 30 17:33:33 2024] task:fstime state:D stack:0 pid:16102 tgid:16102 ppid:16093 flags:0x00000204
[Sat Nov 30 17:33:33 2024] Call trace:
[Sat Nov 30 17:33:33 2024] __switch_to+0xf4/0x160
[Sat Nov 30 17:33:33 2024] __schedule+0x2a0/0x838
[Sat Nov 30 17:33:33 2024] schedule+0x30/0xc8
[Sat Nov 30 17:33:33 2024] schedule_preempt_disabled+0x18/0x30
[Sat Nov 30 17:33:33 2024] rwsem_down_write_slowpath+0x33c/0x820
[Sat Nov 30 17:33:33 2024] down_write+0x60/0x78
[Sat Nov 30 17:33:33 2024] sync_inodes_sb+0xa0/0x110
[Sat Nov 30 17:33:33 2024] sync_inodes_one_sb+0x24/0x38
[Sat Nov 30 17:33:33 2024] iterate_supers+0xb4/0x1f8
[Sat Nov 30 17:33:33 2024] ksys_sync+0x54/0xc8
[Sat Nov 30 17:33:33 2024] __arm64_sys_sync+0x18/0x30
[Sat Nov 30 17:33:33 2024] invoke_syscall+0x50/0x120
[Sat Nov 30 17:33:33 2024] el0_svc_common.constprop.0+0xc8/0xf0
[Sat Nov 30 17:33:33 2024] do_el0_svc+0x24/0x38
[Sat Nov 30 17:33:33 2024] el0_svc+0x34/0x128
[Sat Nov 30 17:33:33 2024] el0t_64_sync_handler+0x100/0x130
[Sat Nov 30 17:33:33 2024] el0t_64_sync+0x188/0x190
......
unixbench abnormal result logs(partial):
......
COUNT0: 1672180
COUNT1: 0
COUNT2: KBps
TIME: 30.0
elapsed: 197.830620 // one fstime process lasts for 197s
pid: 32800
status: 0

COUNT0: 1495245
COUNT1: 0
COUNT2: KBps
TIME: 30.0
elapsed: 217.923324 // another fstime process lasts for 217s
pid: 32804
status: 0

COUNT0: 1746359
COUNT1: 0
COUNT2: KBps
TIME: 30.0
elapsed: 212.533035
pid: 32807
status: 0
......
`
As a result, some fstime processes which have finished the dual sync operations will enter the file copy phase earlier than others that are still block waiting, so the actual amount of parallel file copy processes may be less than the number specified by the '-c' parameter in some early stages during the test, which is not the expected behavior for the test purpose, AFAICS. The less amount of the parallel file copy processes running, the less pressure to the memory bandwidth, so the score of each fstime process will grow under less memory bandwidth contention, all the fstime processes will still finish the test finally, but the total benchmark run time gets longer as shown in the "abnormally high score" figure, so the total score will be higher than the normal case.

Can we just call fsync to synchronize changes to the r/w files instead of doing the system-wide sync to avoid this issue?

Thanks!

The text was updated successfully, but these errors were encountered:

Call fsync() to synchronize changes to the r/w files instead of doing the system-wide sync. This may save some time in large concurrency scenarios. github: closes kdlucas#103

kdlucas · 2024-12-05T15:35:00Z

Could you send me a message at [email protected], I want to discuss a few things with you.

Zengkai163 · 2024-12-07T03:22:13Z

Hi Lucas, At 2024-12-05 23:35:22, "Kelly Lucas" ***@***.***> wrote: Could you send me a message at ***@***.***, I want to discuss a few things with you. OK, I have sent the message to ***@***.***, Look forward to your reply. Thanks! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

kdlucas · 2024-12-07T16:30:57Z

I didn't receive any message, could you try again using: ***@***.*** ? Lucas kdLucas

…

On Fri, Dec 6, 2024 at 9:22 PM Zengkai163 ***@***.***> wrote: Hi Lucas, At 2024-12-05 23:35:22, "Kelly Lucas" ***@***.***> wrote: Could you send me a message at ***@***.***, I want to discuss a few things with you. OK, I have sent the message to ***@***.***, Look forward to your reply. Thanks! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread. — Reply to this email directly, view it on GitHub <#103 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA4T67I6JNOOAUGRYBFHCX32EJSXVAVCNFSM6AAAAABSYSSQFKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRUHAZTOMZYGQ> . You are receiving this because you commented.Message ID: ***@***.***>

kdlucas · 2024-12-07T16:32:42Z

Not sure if you can see the correct address, it is [email protected]

Zengkai163 · 2024-12-09T03:24:14Z

Hi Lucas, I tried sending to [email protected]，it seems that the address is invalid, so is "[email protected]" the correct email address? I switched to sending in plain text now. Hope you can receive this message. Thanks! At 2024-12-08 00:33:03, "Kelly Lucas" ***@***.***> wrote: Not sure if you can see the correct address, it is ***@***.*** — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

kdlucas · 2024-12-09T03:41:58Z

What is your email address?

…

On Sun, Dec 8, 2024, 9:24 PM Zengkai163 ***@***.***> wrote: Hi Lucas, I tried sending to ***@***.***，it seems that the address is invalid, so is ***@***.***" the correct email address? I switched to sending in plain text now. Hope you can receive this message. Thanks! At 2024-12-08 00:33:03, "Kelly Lucas" ***@***.***> wrote: Not sure if you can see the correct address, it is ***@***.*** — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***> — Reply to this email directly, view it on GitHub <#103 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA4T67IT4PIKJZYLBBH42ZT2EUEPJAVCNFSM6AAAAABSYSSQFKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRWG42DEOBTHE> . You are receiving this because you commented.Message ID: ***@***.***>

Zengkai163 · 2024-12-09T04:14:54Z

[email protected]

Zengkai163 · 2024-12-09T05:00:29Z

Hi Lucas, My email address is ***@***.*** I don't know why the email addresses are masked, I tested it by myself, it is shown normally, maybe the github will mask the email addresses, if you still can see my email address above, please replace the "#" below with "@" zzk20210127#163.com Thanks! At 2024-12-09 11:42:20, "Kelly Lucas" ***@***.***> wrote: What is your email address?

On Sun, Dec 8, 2024, 9:24 PM Zengkai163 ***@***.***> wrote: Hi Lucas, I tried sending to ***@***.***，it seems that the address is invalid, so is ***@***.***" the correct email address? I switched to sending in plain text now. Hope you can receive this message. Thanks! At 2024-12-08 00:33:03, "Kelly Lucas" ***@***.***> wrote: Not sure if you can see the correct address, it is ***@***.*** — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***> — Reply to this email directly, view it on GitHub <#103 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA4T67IT4PIKJZYLBBH42ZT2EUEPJAVCNFSM6AAAAABSYSSQFKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRWG42DEOBTHE> . You are receiving this because you commented.Message ID: ***@***.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Zengkai163 · 2024-12-09T06:10:46Z

Hi Lucas, I can receive your Hello message. Thanks! At 2024-12-09 13:00:11, "zzk20210127" ***@***.***> wrote: Hi Lucas, My email address is ***@***.*** I don't know why the email addresses are masked, I tested it by myself, it is shown normally, maybe the github will mask the email addresses, if you still can see my email address above, please replace the "#" below with "@" zzk20210127#163.com Thanks! At 2024-12-09 11:42:20, "Kelly Lucas" ***@***.***> wrote: What is your email address?

On Sun, Dec 8, 2024, 9:24 PM Zengkai163 ***@***.***> wrote: Hi Lucas, I tried sending to ***@***.***，it seems that the address is invalid, so is ***@***.***" the correct email address? I switched to sending in plain text now. Hope you can receive this message. Thanks! At 2024-12-08 00:33:03, "Kelly Lucas" ***@***.***> wrote: Not sure if you can see the correct address, it is ***@***.*** — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***> — Reply to this email directly, view it on GitHub <#103 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA4T67IT4PIKJZYLBBH42ZT2EUEPJAVCNFSM6AAAAABSYSSQFKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRWG42DEOBTHE> . You are receiving this because you commented.Message ID: ***@***.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

wangxp006 · 2025-01-02T06:00:34Z

Hi @Zengkai163 :
Is this issue only present on HDD? I cannot reproduce this issue when using AMD Milan + NVMe and ARM Neoverse-N2 + NVMe.

Zengkai163 · 2025-01-02T06:19:09Z

Yes, it may be related to specific HDDs.
This issue needs further investigation.

Zengkai163 mentioned this issue Dec 3, 2024

fstime: replace sync() with fsync() #104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abnormally high score appears randomly in parallel fstime test while running large amount of benchmark copies with HDDs #103

Abnormally high score appears randomly in parallel fstime test while running large amount of benchmark copies with HDDs #103

Zengkai163 commented Nov 30, 2024 •

edited

Loading

kdlucas commented Dec 5, 2024

Zengkai163 commented Dec 7, 2024 via email

kdlucas commented Dec 7, 2024 via email

kdlucas commented Dec 7, 2024

Zengkai163 commented Dec 9, 2024 via email •

edited

Loading

kdlucas commented Dec 9, 2024 via email

Zengkai163 commented Dec 9, 2024 •

edited

Loading

Zengkai163 commented Dec 9, 2024 via email

Zengkai163 commented Dec 9, 2024 via email

wangxp006 commented Jan 2, 2025

Zengkai163 commented Jan 2, 2025

Abnormally high score appears randomly in parallel fstime test while running large amount of benchmark copies with HDDs #103

Abnormally high score appears randomly in parallel fstime test while running large amount of benchmark copies with HDDs #103

Comments

Zengkai163 commented Nov 30, 2024 • edited Loading

kdlucas commented Dec 5, 2024

Zengkai163 commented Dec 7, 2024 via email

kdlucas commented Dec 7, 2024 via email

kdlucas commented Dec 7, 2024

Zengkai163 commented Dec 9, 2024 via email • edited Loading

kdlucas commented Dec 9, 2024 via email

Zengkai163 commented Dec 9, 2024 • edited Loading

Zengkai163 commented Dec 9, 2024 via email

Zengkai163 commented Dec 9, 2024 via email

wangxp006 commented Jan 2, 2025

Zengkai163 commented Jan 2, 2025

Zengkai163 commented Nov 30, 2024 •

edited

Loading

Zengkai163 commented Dec 9, 2024 via email •

edited

Loading

Zengkai163 commented Dec 9, 2024 •

edited

Loading