-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WARNING: CPU: X PID: XXXX at include/linux/rwsem.h:80 follow_pte+0xf8/0x120 on resume #719
Comments
Can you please attach a full nvidia-bug-report.log.gz? |
Here it is: This first appeared in kernel 6.10. Kernels 6.9 and earlier don't exhibit this issue. |
Thanks for the log. I've filed NVIDIA internal bug 4922186 for this. Knowing this is specific to >= Linux kernel 6.10 helps; thanks for that isolation.
|
Yes.
I'll try, and report later.
Yes. |
Removing No more dmesg spam (more than 60KB of messages). I started using the option because you or @aaronp24 told me it was necessary to properly restore the system state. Probably I had some issues with either Firefox or Chrome misbehaving on resume. It was a long time ago. I have 64GB of RAM, most of it completely free, I'm not using SWAP or hibernate. |
Thank you for those experiments. That will help us focus our debugging. |
I'd been following #662 mistakenly this whole time when it's this issue that's been affecting every kernel since 6.10 release, as mentioned. I should have paid more attention to those stack traces! D'oh!
Yes
Yes. I've never had
Yes, and I'm unable to boot with `fbdev=1' anyhow. See related Arch bug Using the following kernel parameters:
The attached stacktrace repeats ~20ish times per suspend: nvidia-sleep-stacktrace.txt /usr/lib/modprobe.d/nvidia-sleep.conf:
|
Please attach your nvidia-bug-report as well. |
Edited my post and attached it. I also just built and booted kernel 6.11.5, and suspended one time to reproduce this issue, so the |
Happening to me also on the proprietary drivers as well on linux 6.10.10 |
If anyone is adventurous, could you retest this configuration with Linux kernel v6.12-rc1 or later? It looks like -rc4 is currently the latest. We're still investigating, but so far: I don't know if the 'WARNING: CPU: 6 PID: 7530 at include/linux/rwsem.h:80 follow_pte+0xf8/0x120' message is synonymous with the failure, or just coincidental. We've seen that warning and backtrace in other scenarios where suspend/resume still worked. That particular warning was introduced by this kernel commit:
There was then some upstream kernel discussion about various drivers that were impacted by it:
This area was refactored more upstream:
which is included in Linux kernel v6.12-rc1 and later. In our testing, that resolved the warning message, but I don't have confirmation yet whether it impacted NVreg_EnableS0ixPowerManagement=1 configurations. |
6.12 will not work right now since its not able to open the display. |
I /think/ the latest 565.xx beta should already contain compatibility fixes for 6.12-rc's, but let me know if that is not your experience. |
No, it does not. The module compiles, yes, but it crashes. The above mentioned patch fixes it. |
Can confirm you need this patch to open the display on 565.57.01 / linux 6.12-rc4. |
Resume after suspend is working fine here, but I'm also having the same kind of trace.
|
On a second resume I again get all these warnings. Looks like
|
Getting the same kernel stack traces on suspend/resume on open kernel driver 565. Causes intermittent behaviour for me. Sometimes one monitor out of 2 comes back. Sometimes it causes my monitors to swap relative positions (possibly DP-1 identifies as DP-2..) Confirming all is solved when using an lts kernel (6.6) Edit: Tried on 6.12 rc and no issues anymore. Suspend all working normally. |
I still have this issue with linux lts (6.6) kernel and open kernel driver 565. |
@aritger I wasn't adventurous enough and waited for stable kernel release. I can confirm this issue is indeed resolved with the API changes in kernel 6.12.
|
Thanks for confirming. I'm sorry it took us a bit to release a 6.12-compatible driver. My post earlier in this issue still summarizes my understanding: Maybe there are additional issues, but I think the originally reported symptom (the "WARNING: CPU: X PID: XXXX at include/linux/rwsem.h:80 follow_pte+0xf8/0x120 on resume") is fixed by Linux kernel 6.12. If there are additional suspend/resume problems, let's file separate Issues for them. Thanks. |
NVIDIA Open GPU Kernel Modules Version
565.57.01
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Fedora 40
Kernel Release
Linux zen 6.11.4-zen3 #1 SMP PREEMPT_DYNAMIC Tue Oct 22 11:16:40 UTC 2024 x86_64 GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA GeForce RTX 4070 SUPER
Describe the bug
The issue with resuming is not fixed in beta driver 565.57.01 :-(
There are even MORE dmesg errors than in the previous stable driver.
I'm using XFCE without compositing and these nvidia kernel modules options:
To Reproduce
Suspend/resume.
Bug Incidence
Always
nvidia-bug-report.log.gz
kernel-trace.txt
The text was updated successfully, but these errors were encountered: