-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RTD3 dont allow gpu to sleep after a monitor has been plugged and unplugged on prime reverse sync #759
Comments
Hi @Aetherall, We are tracking this in bug 5034343 internally. I am very interested in this issue myself.
My expectation is that the hotplug event handler is enumerated by the fbdev core API. I have an explanation of this here. If you use a LTS kernel like kernel 6.6, are you saying that you do not have this issue? I would be surprised if that was the case. If so, could you provide two bug collection reports? One with the 6.6 kernel and one with 6.12 kernel using the same driver version. If you would not mind following up with abchauhan on the NVIDIA forum post, that would be appreciated as well. That way, I can be provided with a repro setup. In theory, I should be able to reproduce this with my work laptop, but it helps us with overall process. |
Hi @Binary-Eater thanks for the followup ! I initially had the issue on LTS kernel, and later upgraded to 6.12 to see if it would fix the issue. I have not reverted back as this kernel version contains other unrelated improvements I want to keep. I will followup with abchauhan asap and provide the gz logfile and a reproducible environment, however I wont be available for new years eve so it might take few days. Meanwhile here is my nvidia nixos configuration if you want to reproduce it on the same os as well. {
config,
pkgs,
lib,
...
}: {
boot.kernelPackages = pkgs.linuxPackages_latest;
hardware.graphics.enable = true;
powerManagement.enable = true;
services.auto-cpufreq.settings = {
battery = {
governor = "powersave";
turbo = "auto";
};
charger = {
governor = "performance";
turbo = "auto";
};
};
hardware.nvidia = {
modesetting.enable = true;
powerManagement.enable = true;
dynamicBoost.enable = true;
nvidiaPersistenced = true;
open = true;
nvidiaSettings = true;
package = config.boot.kernelPackages.nvidiaPackages.beta;
};
services.udev.extraRules = ''
# Create consistent gpu devices symlinks
ACTION=="bind", SUBSYSTEM=="pci", ATTRS{vendor}=="0x8086", ATTR{class}=="0x030000", RUN+="${pkgs.coreutils-full}/bin/ln -s /dev/dri/by-path/pci-0000:00:02.0-card /dev/gpu_intel"
ACTION=="bind", SUBSYSTEM=="pci", ATTRS{vendor}=="0x10de", ATTR{class}=="0x030000", RUN+="${pkgs.coreutils-full}/bin/ln -s /dev/dri/by-path/pci-0000:01:00.0-card /dev/gpu_nvidia"
'';
services.xserver.videoDrivers = ["nvidia"];
environment.sessionVariables.AQ_DRM_DEVICES = "/dev/gpu_nvidia";
environment.sessionVariables.VK_ICD_FILENAMES = "/run/opengl-driver/share/vulkan/icd.d/nvidia_icd.x86_64.json";
environment.sessionVariables.GBM_BACKEND = "nvidia-drm";
environment.sessionVariables.LIBVA_DRIVER_NAME = "nvidia";
environment.sessionVariables.__GLX_VENDOR_LIBRARY_NAME = "nvidia";
specialisation = {
powersave.configuration = {
system.nixos.tags = ["powersave"]; # this specialisation have the RTD3 issue
hardware.nvidia = {
powerManagement.enable = true;
powerManagement.finegrained = true;
prime = {
offload.enable = true;
offload.enableOffloadCmd = true;
reverseSync.enable = true;
intelBusId = "PCI:0:2:0";
nvidiaBusId = "PCI:1:0:0";
};
};
environment.sessionVariables.AQ_DRM_DEVICES = lib.mkForce "/dev/gpu_intel:/dev/gpu_nvidia";
environment.sessionVariables.VK_ICD_FILENAMES = lib.mkForce "";
environment.sessionVariables.GBM_BACKEND = lib.mkForce "";
environment.sessionVariables.LIBVA_DRIVER_NAME = lib.mkForce "";
environment.sessionVariables.__GLX_VENDOR_LIBRARY_NAME = lib.mkForce "";
};
};
} Happy new year ! |
Yeah I'm seeing this on an RTX 4060 mobile. Maybe |
NVIDIA Open GPU Kernel Modules Version
NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 565.77 srcversion: 0BDAE46B2642DAFAAF16C9C
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Nixos unstable
Kernel Release
Linux 6.12.6 NixOS SMP PREEMPT_DYNAMIC Thu Dec 19 17:13:24 UTC 2024 x86_64 GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
GPU 0: NVIDIA GeForce RTX 3080 Ti Laptop GPU
Describe the bug
GPU power state locked in D0 after a monitor has been plugged and unplugged to the gpu.
I am using reverse sync on a advanced optimus laptop, to allow hotplug monitors in the gpu hdmi/dp ports.
After a clean boot, and before plugging an external monitor, the nvidia gpu is in
D3cold
state.I can power on the nvidia card using
nvidia-smi
or by runningglxinfo
, and the power state will switch to D0 before going back to sleep as intented.The issue arises when a monitor is plugged to the gpu. this will wake the gpu to allow reverse prime to render to the external display.
However, when unplugging the monitor, the power state will never leave D0. (no process running on the gpu)
I found related posts on the nvidia forums,
https://forums.developer.nvidia.com/t/nvidia-dgpu-in-hybrid-optimus-laptop-not-powering-down-after-unplugging-external-monitor/318196
https://forums.developer.nvidia.com/t/565-release-feedback-discussion/310777/154
https://forums.developer.nvidia.com/t/565-release-feedback-discussion/310777/41
and most importantly https://forums.developer.nvidia.com/t/bug-linux-driver-fails-to-remove-framebuffer-device-when-hdmi-cable-plugged-out/316645
In the last one,
gm151
noticed that the framebuffer created when plugging the monitor is never cleaned up.I am facing the same situation and did some more testing.
I can indeed see that the frame buffer at
/dev/fb1
( /sys/class/graphics/fb1 ) is created when the monitor is plugged in and not removed on unplug.Reloading the nvidia-drm module allow to go back in sleep mode:
modprobe -r nvidia-drm && modprobe nvidia-drm
I notice that the ghost framebuffer is removed afterwards, maybe it is what allows RTD3 to kick in.
dumping the framebuffers using
cat /sys/kernel/debug/dri/12{8,9}/framebuffer
shows that:on fresh boot with d3cold and no monitor, the nvidia related dri framebuffer is empty (empty file)
whereas the other contains a framebuffer allocated by
fbcon
after plugging in a monitor, the nvidia dri framebuffer now contains a framebuffer allocated by fcon too, with a layer size corresponding to the monitor.
after unplugging the monitor, the framebuffer does not go back to an empty file, and stays allocated by
fbcon
I tried every combination of kernel parameters / module options to no avail. The more I tried with are:
I also tried several linux kernel versions.
It seems like the code responsible for fbdev moved recently in the linux kernel and in the oppen-gpu-kernel-modules,
I saw changes related to hotplugs events, so maybe we are missing a handler to cleanup the framebuffers ?
Thanks !
To Reproduce
cat /sys/class/drm/card*/device/power_state
to check and wait until gpu is D3coldls /dev/fb*
-> should show only the fb0 ( integrated graphics -> internal monitor )nvidia-smi
-> boot the gpucat /sys/class/drm/card*/device/power_state
to check and wait until gpu is D3coldcat /sys/class/drm/card*/device/power_state
to check and wait until gpu is D0cat /sys/class/drm/card*/device/power_state
to check and wait until gpu is D3cold <- never happensBug Incidence
Always
nvidia-bug-report.log.gz
I have tested 20+ module option combination, which options are the most interesting to generate the report with ?
More Info
No response
The text was updated successfully, but these errors were encountered: