-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nv_module_resources_init
tried to execute NX-protected page
#765
Comments
Hi @SyntheticBird45 , I just found the
Just a suggestion for your reference. |
Thanks @TheBetterSolution, I'll try soon |
I've tried commenting the macro but it then completely went haywire on the compile errors (see logs below): I also tried to |
Excuse me @SyntheticBird45
But I can't ensure that's the root cause. I think the following code needs to change for removing NV_KTIME_GET_RAW_TS64_PRESENT: But please don't try it, we can wait the official response. |
Hi. I don't know if it is related, but the mention of CFI here reminded me of this issue: In that issue, the problem is that the core of nvidia.ko isn't being built by kbuild, and thus isn't getting kbuild's extra CFLAGS for CFI. In the current issue, I don't think we're even getting far enough to execute any of the "core" of nvidia.ko, so there is probably something else going on here. But, you may trip on issue/439 once we resolve the current problem. Or, something about the core of nvidia.ko not being built with kbuild's CFLAGS could be confusing things and triggering the current problem. I see:
It might be a useful experiment to test without RANDSTRUCT. Looking at your call trace:
And @TheBetterSolution's speculation, I guess you are concerned about this path?
I suppose you could check the Module.symvers for your kernel (typically /usr/lib/modules/
(i.e., part of vmlinux, not a separate kernel module) Was there specific reason to suspect ktime_get_raw_ts64(), or was that just speculation? It might be easiest to sprinkle from printks in nv_module_resources_init() and its callees to determine where exactly the asm_exc_page_fault is happening.
|
NVIDIA Open GPU Kernel Modules Version
565.77
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Artix Linux
Kernel Release
Linux unknown 6.11.10-hardened1-1-hardened
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
Nvidia Geforce RTX 4070 (nvidia-smi don't work since driver can't load)
Describe the bug
On linux kernel 6.11.10 (and 6.12.6) (linux-hardened patches applied), built with Clang CFI and Thin LTO, the open driver tries to execute an NX page.
I was able to reproduce this on 6.12.6 but another unrelated bug make me unable to profit from any GPU.
I downgraded to 6.11.10 but issue stay the same.
This issue DO NOT happen on the official linux-hardened arch linux package compiled with GCC.
Kernel stack trace (truncated for privacy concerns, can add additional details if required):
dmesg_nvidia_open.log
To Reproduce
sudo modprobe nvidia
Killed
dmesg
will give this stack trace.Bug Incidence
Always
nvidia-bug-report.log.gz
No point since neither nvidia driver loads, neither is it runtime feature related. Also i don't like sharing my whole PCI tree in public github.
More Info
No response
The text was updated successfully, but these errors were encountered: