Ask questions about the codebase here #157

s66104444 · 2022-05-13T11:15:46Z

s66104444
May 13, 2022

[mtijanic] Please use this issue to ask simple questions about the codebase, such as what a certain feature is, how a given codepath works, what a given acronym means, etc. Please keep your questions as specific as possible.

-- Original user post below this line --

I see a lot of RM in the code related to the RMAPI module, what does this RM and GSP-RM mean exactly?

mtijanic · 2022-05-13T11:26:25Z

mtijanic
May 13, 2022
Maintainer

Hello and thank you for your interest!

RM stands for "Resource Manager" and it is the internal name of the driver that gets compiled into nvidia.ko module - both the open source one (sometimes referred to as "OpenRM" internally) and the proprietary version. It can also refer to the team working on this code.

Depending on the context, GSP-RM can refer either to the new driver architecture that uses the "GSP" microcontroller, or it can refer to the code running on said microcontroller. To avoid ambiguity, the latter is often called "Physical RM" instead, and code running in kernel is called "Kernel RM".

RMAPI is the interface nvidia.ko (RM) exposes to other kernel modules and to userspace.

0 replies

mtijanic · 2022-05-13T11:29:11Z

mtijanic
May 13, 2022
Maintainer

Meta: Do you think it makes sense to group these questions into a single issue, which we could then pin?

0 replies

s66104444 · 2022-05-13T11:34:43Z

s66104444
May 13, 2022
Author

Thanks! Yes, I think it's a good idea to group these questions into a single issue.

0 replies

mtijanic · 2022-05-13T11:44:39Z

mtijanic
May 13, 2022
Maintainer

Okay, this is now a pinned generic code-related Q&A issue.

0 replies

antdking · 2022-05-13T13:18:21Z

antdking
May 13, 2022

Would it be possible to enable Github Discussions, which has a cleaner structure for quick questions?

0 replies

mtijanic · 2022-05-13T13:21:36Z

mtijanic
May 13, 2022
Maintainer

@antdking Please see #44 . I do agree Discussions would be a much cleaner option, and I expect we'll enable them soon enough. Will gladly retire this issue when that happens, but until such a time this is the best we have for these quick one-off questions.

0 replies

DemiMarie · 2022-05-13T15:58:26Z

DemiMarie
May 13, 2022

Will this driver support GPU virtualization with untrusted guests on consumer hardware? This would be a huge win for Qubes OS, which is currently forced to rely on software rendering. Support for CUDA or for enterprise GPUs would not be as helpful, as Qubes OS is an end-user operating system focused on the desktop use-case.

1 reply

mtijanic May 14, 2022
Maintainer

Hello @DemiMarie , thanks for the interest. The currently published driver does not support virtualization, neither as a host nor a guest. I currently don't have any roadmap information to share regarding this.

mtijanic · 2022-05-13T16:35:21Z

mtijanic
May 13, 2022
Maintainer

@DemiMarie At this point all I can say is that the current codebase does not support virtualization - neither as a host nor a guest. I don't currently have any information about future changes in this regard.

0 replies

aritger · 2022-05-13T17:29:42Z

aritger
May 13, 2022
Maintainer

We've now enabled Discussions in this github repository. I've moved this issue to Discussions, though maybe we should close this and start a separate discussion for each question...

0 replies

RealAstolfo · 2022-05-14T02:31:41Z

RealAstolfo
May 14, 2022

As many of us have noticed so far. NVIDIA has defaulted to one of the age old standards of C90. Moving forward, would NVIDIA be open to adopting more modern standards such as C11? or is there a very specific reason as to why we must stay C90?

My argument for adopting the newer standard would be the newer features, most prominently seen in C++ via operators, templates, and constexpr. constexpr in particular leads to less hardcoded variables in favor of a compile time resolved one.

What do you think?

3 replies

johnnynunez May 14, 2022

It would be important for the standard to be upgraded to c11.
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.18-C11-Plan

ElijahPepe May 14, 2022

At least it's not C89.

I'll submit a draft PR for this. I'm sure NVIDIA knows about C11 though; it's probably because they need these kernel modules to be able to be developed on any platform.

On second thought, C11 doesn't have many killer features. C99 should be the next logical step from C90, because the difference between C90 and C99 is huge and would be most desirable for those aforementioned compatibility considerations and for modern features.

mtijanic May 14, 2022
Maintainer

Thank you for wanting to bring us into the 21st century! Unfortunately, much of the code published here is shared by various other projects and is compiled for a wide variety of platforms - from Windows to custom NV-internal ISAs and OSes - which means it must support a wide variety of toolchains and their quirks.

Generally, this means that we need to stick with C90. Many (but not all) C99 features are supported, but in some cases they are avoided due to style guidelines. We hope to publish a version of these guidelines in the future.

Mattwmaster58 · 2022-05-14T19:21:27Z

Mattwmaster58
May 14, 2022

I'm curious what the reason was for retaining whitespace in what look like deleted sections of the source eg https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/kernel-open/nvidia/nv.c#L623. Was this intentional or just a side effect on how it was done?

3 replies

ElijahPepe May 14, 2022

Likely just a side effect of how they chose to get rid of the lines and nothing else.

ptr1337 May 14, 2022

You can see it here:
https://gist.github.com/ptr1337/2e361f8f87abd57b1f6c1ea443f87f46

mtijanic May 14, 2022
Maintainer

This is an issue with our packaging scripts that removed parts of the codebase that are not relevant to this driver. Successive blank lines should have been collapsed into a single one. We will fix the scripts.

cdknight · 2022-05-16T04:12:47Z

cdknight
May 16, 2022

Considering that NVIDIA does not currently have plans to open-source the userspace components of the driver, at the minimum, will there be any officiial documentation or libraries to help create a userspace RM client (I think this constitutes opening and controlling /dev/nvidiactl and /dev/nvidia%d)? What I mean something like a FOSS NvAPI or nvml, but not even to that level, but just simply documenting how to call methods defined in for example (but it could be any of the NVXXXX_CTRL functions): src/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080bus.h. Such documentation may aid (community) userspace driver development, but even at the minimum, help with GPU monitoring and control.

8 replies

mtijanic May 16, 2022
Maintainer

There are currently no plans to open source NVML or NVAPI, nor to maintain a third library, sorry.

DemiMarie May 16, 2022

That’s unfortunate. Such a library really ought to be in this repository, so that it can be kept in sync with kernel driver changes.

alexflint Jul 10, 2023

I am also interested in writing a userspace RM client. Is there any documentation or example code that might help me accomplish this? I am aware that there is no ABI stability and I'm willing to target a particular driver version.

tetsuro0086 Dec 17, 2024

Certainly we can provide these docs and would be interested in what tools the community makes with them. We ask for a bit of patience though while we find the time to write such docs.

@mtijanic
It's been 2.5 years since this discussion started. We are waiting patiently, so please let us know if there is any progress. This is an area that we are interested in, not only as individuals but also as a company and organization, so we would be happy to cooperate if we can.

mtijanic Dec 17, 2024
Maintainer

Hey there! See this post from a few months ago: #530 (comment)

Probably best to continue the discussion on that thread if you want some additional clarification or examples. I don't think we'll be able to staff writing actual docs any time soon, sorry.

edisionnano · 2022-05-16T11:46:51Z

edisionnano
May 16, 2022

Since the other topic got locked Ill ask here too. While this Repo provides kernel drivers for turing, ampere and upcoming generations will PMU firmware be published to allow nouveau to reclock Maxwell 2 and Pascal GPUs?

0 replies

cdknight · 2022-05-16T19:13:13Z

cdknight
May 16, 2022

Skimming through the codebase, I see references to something called NVOS and NVOC. Their similar names (perhaps incorrectly) leads me to believe they are related. What are they?

2 replies

mtijanic May 16, 2022
Maintainer

See here for NVOC.
NVOS is part of the SDK which exposes the interface to RM from (mostly userspace) clients. It defines ioctl numbers and the like.

similar names

Almost everything starts with "NV" or "RM", so you can safely just treat those as namespace and not a meaningful part of the name :)

cdknight May 19, 2022

Thanks for the information! (Should have looked around more…)

zakerinasab · 2023-03-09T23:21:46Z

zakerinasab
Mar 9, 2023

Hi,

I'm monitoring the ioctl commands for a hw accelerated video encoding session. I see a series of calls to C9B7 (NVC9B7_VIDEO_ENCODER) class. However, class header here https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/sdk/nvidia/inc/class/clc9b7.h is empty. I found a similar header for C5B7 here: https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/video/clc5b7.h and the command codes seem to match, but I wonder if someone can confirm that the codes are the same, or even better, fill the header for class C9B7.

Thanks.

0 replies

X547 · 2023-11-12T09:35:44Z

X547
Nov 12, 2023

@aritger

What is "FINN, an NVIDIA coding tool"? It it some kind of IDL compiler for RM API to serialize API calls into binary streams? I am currently attempting to run Nvidia driver as userland server process in microkernel-like architecture, so I need to serialize RM API calls into IPC messages. Is rmapi_finn.c a suitable solution for this task? I heard that Nvidia support microkernel systems like QNX, so it should be a solution for running driver in userland.

3 replies

aritger Nov 13, 2023
Maintainer

Yes, FINN is a serializer/deserializer; it is still a work-in-progress. Thus far, we've not published the finn tool itself, just the code generated by finn.

I'm not sure rmapi_finn.c has all the information it would need to serialize all the RMAPIs, but you're certainly welcome to use it as a starting point.

Our QNX support uses a separate code base, and not the resource manager code from which the open-gpu-kernel-modules is derived.

Good luck!

X547 Nov 13, 2023

This is how far I got, it seems running driver is userland is possible, but UNIX API may not fit well (no FDs etc.). It is developed for Haiku operating system.

~/Tests/GL/nvidia_gsp_user> objects.x86_64-cc13-debug/nvidia_gsp
rm_init_rm()
Scan PCI:
  01.00.00: 10de.1ff2
    BAR[0]: 0xfb000000, 0x1000000
    BAR[1]: 0xd0000000, 0x10000000
    BAR[3]: 0xe0000000, 0x2000000
    BAR[5]: 0xf000, 0x80
rm_init_private_state()
rm_init_adapter()
NVRM: GPU 0000:01:00.0: RmInitAdapter
NVRM: GPU 0000:01:00.0: RmSetupRegisters for 0x10de:0x1ff2
NVRM: GPU 0000:01:00.0: pci config info:
NVRM: GPU 0000:01:00.0:    registers look  like: 0xfb000000 0x1000000
NVRM: GPU 0000:01:00.0:    fb        looks like: 0xd0000000 0x10000000
NVRM: GPU 0000:01:00.0: Successfully mapped framebuffer and registers
NVRM: GPU 0000:01:00.0: final mappings:
NVRM: GPU 0000:01:00.0:     regs: 0xfb000000 0x1000000 0x0xdea75ec000
NVRM: PBI is not supported for GPU 0000:01:00.0
NVRM: GPU 0000:01:00.0: RmInitAdapter succeeded!
nv->rmapi.hClient: 0xc1d00008
nv->rmapi.hDevice: 0xcaf00000
nv->rmapi.hSubDevice: 0xcaf00001
nv->rmapi.hI2C: 0xcaf00002
nv->rmapi.hDisp: 0xcaf00003
[WAIT]
AllocMemoryTest()
  ret: 0
  memoryHandle: 0xcaf00004
  cpuAddress: 0x14ccc2cb000
[WAIT]
rm_shutdown_adapter()
NVRM: GPU 0000:01:00.0: Tearing down registers
rm_free_private_state()
rm_shutdown_rm()

X547 Nov 13, 2023

rmapi_finn.c seems unable to serialize any Alloc() requests.

tangds1234 · 2024-03-25T08:49:24Z

tangds1234
Mar 25, 2024

While running TensorFlow inference tasks with Docker, I noticed that when tracking the code, 16 channels are always created during task execution:

8 channels: Belonging to one TSG (TSG1), engine type is 1， runlistID is 0.
4 channels: Belonging to one TSG (TSG2), engine type is b， runlistID is 1.
4 channels: Belonging to one TSG (TSG3), engine type is c， runlistID is 2.

Is it possible for a task to create multiple channels that belong to different TSGs but have the same engine type?
For example:

8 channels: Belong to one TSG (TSG1), engine type 1， runlistId is 0.
8 channels: Belong to one TSG(TSG2), engine type 1， runlistId is 0.

1 reply

tangds1234 Mar 25, 2024

GPU: nvidia A10
driver version:

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX Open Kernel Module for x86_64  535.54.03  Release Build  (xxxxx)  Mon 25 Mar 2024 02:39:25 PM HKT
GCC version:  gcc version 10.2.1 20210110 (Debian 10.2.1-6)

X547 · 2024-06-27T22:21:41Z

X547
Jun 27, 2024

@aritger

After some break on Nvidia driver porting experiments it started to fail to halt Falcon, but it worked before. Any ideas what is happening and how to fix it?

ACPI OS interface is not implemented yet. Both versions 545.29.02 and 550.90.07 fail in the same way. UEFI boot is used.

Log

rm_init_rm()
NVRM gpumgrConstruct_IMPL: gpumgrConstruct
NVRM gsyncmgrConstruct_IMPL:
NVRM pfmreqhndlrConstruct_IMPL:
NVRM rcdbFindRingBufferForType: Ring Buffer not found for type 147
NVRM rcdbFindRingBufferForType: Ring Buffer not found for type 149
NVRM _hypervisorDetection_HVM: CPUID is NOT supported!
NVRM osRmInitRm: init rm
NVRM getCpuCounts: RmInitCpuCounts: physical 0x1 logical 0x1
NVRM rmapiControlCacheInit: using cache mode 2
Scan PCI:
  01.00.00: 10de.1ff2
    BAR[0]: 0xfb000000, 0x1000000
    BAR[1]: 0xd0000000, 0x10000000
    BAR[3]: 0xe0000000, 0x2000000
    BAR[5]: 0xf000, 0x80
rm_init_private_state()
rm_init_adapter()
NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
NVRM: GPU 0000:01:00.0: RmInitAdapter
NVRM: GPU 0000:01:00.0: RmSetupRegisters for 0x10de:0x1ff2
NVRM: GPU 0000:01:00.0: pci config info:
NVRM: GPU 0000:01:00.0:    registers look  like: 0xfb000000 0x1000000
NVRM: GPU 0000:01:00.0:    fb        looks like: 0xd0000000 0x10000000
NVRM: GPU 0000:01:00.0: Successfully mapped framebuffer and registers
NVRM: GPU 0000:01:00.0: final mappings:
NVRM: GPU 0000:01:00.0:     regs: 0xfb000000 0x1000000 0x0xeea35c0000
NVRM RmFetchGspRmImages: Failed to load gsp_log_*.bin, no GSP-RM logs will be printed (non-fatal)
NVRM osInitNvMapping: osInitNvMapping:
NVRM gpumgrCreateDevice: gpumgrCreateDevice: deviceInst 0x0 mask 0x1
NVRM gpumgrGetGpuHalFactor: ChipId0[0x167000a1] ChipId1[0x167a1000] SocChipId0[0x0] isFwClient[1] RmVariant[2] tegraType[0]
NVRM halmgrGetHalForGpu_IMPL: Matching PMC_BOOT_42 = 0x167a1000 to HAL_IMPL_TU117
NVRM gpuInitRegistryOverrides_KERNEL: SRIOV status[0].
NVRM gpuInitRegistryOverrides_KERNEL: Split VAS mgmt between Server/Client RM 1
NVRM _hypervisorDetection_HVM: CPUID is NOT supported!
NVRM gpuBuildClassDB_IMPL: num class descriptors: 0x2f
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelBif:0 state change: Undefined -> Construct, took 16us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine OBJTMR:0 state change: Undefined -> Construct, took 5us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 1 allocations, 24 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelMc:0 state change: Undefined -> Construct, took 0us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine SwIntr:0 state change: Undefined -> Construct, took 0us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
nv_alloc_pages(page_count: 1, page_size: 4096, cache_type: 1)
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelMemorySystem:0 state change: Undefined -> Construct, took 38us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 1 allocations, 376 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine MemoryManager:0 state change: Undefined -> Construct, took 12us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 8 allocations, 2112 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelDisplay:0 state change: Undefined -> Construct, took 14us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 9 allocations, 1608 bytes
NVRM kbusConstructHal_GM107: Entered
NVRM kbusInitRegistryOverrides: Using aperture 2 for BAR2 PTEs
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelBus:0 state change: Undefined -> Construct, took 28us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelGmmu:0 state change: Undefined -> Construct, took 16us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 3 allocations, 1760 bytes
NVRM kflcnConfigureEngine_IMPL: for physEngDesc 0x28c40800
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelSec2:0 state change: Undefined -> Construct, took 9us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM kflcnConfigureEngine_IMPL: for physEngDesc 0xda3de400
nv_alloc_pages(page_count: 129, page_size: 4096, cache_type: 0)
NVRM _gspMsgQueueInit: Created command queue.
nv_alloc_pages(page_count: 16, page_size: 4096, cache_type: 0)
nv_alloc_pages(page_count: 64, page_size: 4096, cache_type: 0)
nv_alloc_pages(page_count: 64, page_size: 4096, cache_type: 0)
nv_alloc_pages(page_count: 1, page_size: 4096, cache_type: 0)
nv_alloc_pages(page_count: 1, page_size: 4096, cache_type: 1)
nv_alloc_pages(page_count: 1, page_size: 4096, cache_type: 0)
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelGsp:0 state change: Undefined -> Construct, took 4927us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 17 allocations, 1254208 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine VirtMemAllocator:0 state change: Undefined -> Construct, took 10us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 2 allocations, 32 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelMIGManager:0 state change: Undefined -> Construct, took 6us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 1 allocations, 48 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelGraphicsManager:0 state change: Undefined -> Construct, took 2us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelGraphics:0 state change: Undefined -> Construct, took 13us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 3 allocations, 3360 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelPerf:0 state change: Undefined -> Construct, took 2us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM _krcInitRegistryOverrides: Breakpoint on RC Error is enabled
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelRc:0 state change: Undefined -> Construct, took 9us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine Intr:0 state change: Undefined -> Construct, took 1us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelPmu:0 state change: Undefined -> Construct, took 9us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 4 allocations, 28800 bytes
NVRM kceConstructEngine_IMPL: KernelCE: thisPublicID = 0
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelCE:0 state change: Undefined -> Construct, took 7us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM kceConstructEngine_IMPL: KernelCE: thisPublicID = 1
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelCE:1 state change: Undefined -> Construct, took 8us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM kceConstructEngine_IMPL: KernelCE: thisPublicID = 2
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelCE:2 state change: Undefined -> Construct, took 10us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM kceConstructEngine_IMPL: KernelCE: thisPublicID = 3
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelCE:3 state change: Undefined -> Construct, took 8us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM kceConstructEngine_IMPL: KernelCE: thisPublicID = 4
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelCE:4 state change: Undefined -> Construct, took 9us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM kceConstructEngine_IMPL: KernelCE: thisPublicID = 5
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelCE:5 state change: Undefined -> Construct, took 8us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM kceConstructEngine_IMPL: KernelCE: thisPublicID = 6
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelCE:6 state change: Undefined -> Construct, took 3us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM kceConstructEngine_IMPL: KernelCE: thisPublicID = 7
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelCE:7 state change: Undefined -> Construct, took 7us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM kceConstructEngine_IMPL: KernelCE: thisPublicID = 8
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelCE:8 state change: Undefined -> Construct, took 9us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM memdescOverrideInstLocList: using video memory for USERD
NVRM kschedmgrConstructPolicy_IMPL: GPU at 0000:01:00.0 has software scheduler DISABLED with policy NONE.
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelFifo:0 state change: Undefined -> Construct, took 21us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 1 allocations, 80 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine OBJUVM:0 state change: Undefined -> Construct, took 0us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine NvDebugDump:0 state change: Undefined -> Construct, took 1us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM knvlinkCoreIsDriverSupported_IMPL: NVLink core lib isn't initialized yet!
NVRM engstateLogStateTransitionPost_IMPL: Engine KernelNvlink:0 state change: Undefined -> Construct, took 9us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine OBJGPUMON:0 state change: Undefined -> Construct, took 0us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM engstateLogStateTransitionPost_IMPL: Engine OBJSWENG:0 state change: Undefined -> Construct, took 2us
NVRM engstateLogStateTransitionPost_IMPL:     Memory usage change: 0 allocations, 0 bytes
NVRM gpumgrSetGpuNvlinkBwModeFromRegistry_IMPL: nvlinkBwMode=0
NVRM osInitNvMapping: device instance          : 0
NVRM osInitNvMapping: NV regs using linear address  : 0xEEA35C0000
NVRM osInitNvMapping: NV fb using linear address  : 0x0
NVRM clUpdatePcieConfig_IMPL: GPU Domain 0 Bus 1 Device 0 Func 0
NVRM objClLoadPcieVirtualP2PApproval: Skipping non-pass-through GPU0
NVRM nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from objClSetPortPcieEnhancedCapsOffsets(pCl, pPort) @ chipset_pcie.c:1622
NVRM nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from objClSetPortPcieEnhancedCapsOffsets(pCl, pPort) @ chipset_pcie.c:1622
NVRM clUpdatePcieConfig_IMPL: Chipset 1042 Domain 0 Bus 0 Device 1 Func 1 PCIE PTR 58
NVRM clUpdatePcieConfig_IMPL: Chipset 1042 Root Port Domain 0 Bus 0 Device 1 Func 1 PCIE PTR 58
NVRM clUpdatePcieConfig_IMPL: Chipset 1042 Board Upstream Port Domain 0 Bus 0 Device 1 Func 1 PCIE PTR 58
NVRM clUpdatePcieConfig_IMPL: Chipset 1042 Board Downstream Port Domain 0 Bus 0 Device 0 Func 0 PCIE PTR 0
NVRM clUpdatePcieConfig_IMPL: FHB Domain 0 Bus 0 Device 0 Func 0 VendorID 1022 DeviceID 1630
NVRM _objClAdjustTcVcMap: NVPCIE: Can not read VC resource control 0 on port 0000:00:01.1 (bug 1048498).
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 9 Subfunction = 0
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 0 Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: NBSI DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 1 Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: NVHG DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 2 Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: MXM DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: NBCI DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 4 Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: NVOP DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 5 Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: PFCG DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 6 Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: GPS_2X DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 7 Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: JT DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 8 Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: PEX DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 9 Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: NVPCF_2X DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = a Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: GPS DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = b Subfunction = 0
NVRM _acpiDsmSupportedFuncCacheInit: NVPCF DSM function not present in ASL.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = d Subfunction = 200
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 4 Subfunction = 5
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 5
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 2 Subfunction = 5
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 1 Subfunction = 5
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 0 Subfunction = 3
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 7 Subfunction = 5
NVRM _acpiGenFuncCacheInit: DSM Test generic subfunction 0x200 is not supported. Indicates possible table corruption.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = d Subfunction = 201
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 4 Subfunction = 6
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 6
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 2 Subfunction = 6
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 1 Subfunction = 6
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 0 Subfunction = 4
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 7 Subfunction = 6
NVRM _acpiGenFuncCacheInit: DSM Test generic subfunction 0x201 is not supported. Indicates possible table corruption.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = d Subfunction = 202
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 4 Subfunction = 10
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 10
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 1 Subfunction = 10
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 0 Subfunction = 6
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 6 Subfunction = 10
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = a Subfunction = 10
NVRM _acpiGenFuncCacheInit: DSM Test generic subfunction 0x202 is not supported. Indicates possible table corruption.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = d Subfunction = 203
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 4 Subfunction = 11
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 11
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 1 Subfunction = 11
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 0 Subfunction = 7
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 6 Subfunction = 11
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = a Subfunction = 11
NVRM _acpiGenFuncCacheInit: DSM Test generic subfunction 0x203 is not supported. Indicates possible table corruption.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = d Subfunction = 204
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 12
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 2 Subfunction = 12
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 1 Subfunction = 12
NVRM _acpiGenFuncCacheInit: DSM Test generic subfunction 0x204 is not supported. Indicates possible table corruption.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = d Subfunction = 205
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 13
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 2 Subfunction = 19
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 1 Subfunction = 13
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 0 Subfunction = 5
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 6 Subfunction = 13
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = a Subfunction = 13
NVRM _acpiGenFuncCacheInit: DSM Test generic subfunction 0x205 is not supported. Indicates possible table corruption.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = d Subfunction = 206
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 14
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 1 Subfunction = 14
NVRM _acpiGenFuncCacheInit: DSM Test generic subfunction 0x206 is not supported. Indicates possible table corruption.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = d Subfunction = 207
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 15
NVRM _acpiGenFuncCacheInit: DSM Test generic subfunction 0x207 is not supported. Indicates possible table corruption.
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 205
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 1 Subfunction = 205
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 2 Subfunction = 205
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 6 Subfunction = 205
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = a Subfunction = 205
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 1
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = d Subfunction = 201
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 4 Subfunction = 6
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 3 Subfunction = 6
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 2 Subfunction = 6
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 1 Subfunction = 6
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 0 Subfunction = 4
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 7 Subfunction = 6
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 7 Subfunction = 1
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 7 Subfunction = 1
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 4 Subfunction = 1a
NVRM remapDsmFunctionAndSubFunction: ACPI DSM remapping function = 4 Subfunction = 1a
NVRM tmrSetCurrentTime_GV100: osGetCurrentTime returns 0x667c95b3 seconds, 0xd873e useconds
nv_alloc_pages(page_count: 10, page_size: 4096, cache_type: 1)
nv_alloc_pages(page_count: 1, page_size: 4096, cache_type: 1)
NVRM kgspInitRm_IMPL: parsed VBIOS version 90.17.94.00.2E
nv_alloc_pages(page_count: 1, page_size: 4096, cache_type: 0)
nv_alloc_pages(page_count: 1, page_size: 4096, cache_type: 0)
nv_alloc_pages(page_count: 6005, page_size: 4096, cache_type: 0)
NVRM _kgspCalculateFwHeapSize: GSP FW heap 105MB of 4GB
NVRM kgspInitRm_IMPL: skipping allocating Scrubber ucode as pre-scrubbed memory (0x10000000 bytes) is sufficient (0x8500000 bytes needed)
NVRM threadStateYieldCpuIfNecessary: Yielding
NVRM _threadNodeCheckTimeout: _threadNodeCheckTimeout: currentTime: 11ad2b8ae0 >= 11ad2b8ae0
NVRM _threadNodeCheckTimeout: _threadNodeCheckTimeout: Timeout was set to: 4000 msecs!
NVRM kflcnWaitForHalt_TU102: Timeout waiting for Falcon to halt
NVRM kflcnWaitForHalt_TU102: bp @ src/kernel/gpu/falcon/arch/turing/kernel_falcon_tu102.c:358
[!] os_dbg_breakpoint
NVRM s_executeFwsec_TU102: failed to execute FWSEC cmd 0x15: status 0x65
NVRM s_executeFwsec_TU102: (note: VBIOS version 90.17.94.00.2E)
NVRM nvAssertOkFailedNoLog: Assertion failed: Call timed out [NV_ERR_TIMEOUT] (0x00000065) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_tu102.c:376
[!] os_dbg_breakpoint
NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0x65
NVRM RmInitAdapter: Cannot initialize GSP firmware RM
NVRM kbusDestructVirtualBar2_VBAR2: MapCount: 0 Bar2 Hits: 0 Evictions: 0
NVRM: GPU 0000:01:00.0: Tearing down registers
NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x65:1644)
rm_free_private_state()
rm_shutdown_rm()
NVRM RmShutdownRm: shutdown rm
NVRM gsyncmgrDestruct_IMPL:
NVRM gpumgrDestruct_IMPL: gpumgrDestruct
[NvPort] ******** Aggregate Memory Tracking ********
  ACTIVE: 0 allocations, 0 bytes allocated (0 useful, 0 meta)
  TOTAL:  744 allocations, 3694470 bytes allocated (3673638 useful, 20832 meta)
  PEAK:   363 allocations, 3227446 bytes allocated (3217282 useful, 10164 meta)
[NvPort] ******** Global Paged Memory Allocator Tracking ********
  ACTIVE: 0 allocations, 0 bytes allocated (0 useful, 0 meta)
  TOTAL:  0 allocations, 0 bytes allocated (0 useful, 0 meta)
  PEAK:   0 allocations, 0 bytes allocated (0 useful, 0 meta)
[NvPort] ******** Global Non-Paged Memory Allocator Tracking ********
  ACTIVE: 0 allocations, 0 bytes allocated (0 useful, 0 meta)
  TOTAL:  731 allocations, 3644446 bytes allocated (3623978 useful, 20468 meta)
  PEAK:   352 allocations, 3177622 bytes allocated (3167766 useful, 9856 meta)

2 replies

aritger Jun 27, 2024
Maintainer

I'm sorry, I don't have any ideas off hand. Does the failure come while trying to initialize GSP, or when trying to shutdown? Assuming this is during initialization, the best I can think to suggest is to step-by-step compare the GSP bootstrap sequence on Linux with what happens in your Haiku port.

Out of curiosity: what is all different in your port? Have you just had to provide an alternate implementation of everything under kernel-open/nvidia/..., or have you had to make more substantial changes to the core of the code base? Knowing what is different might help identify where to look for problems that could explain the failure you are seeing.

X547 Jun 27, 2024

No changes are made to platform-independent Nvidia driver part except Haiku build fixes and logging. Result object file nv-kernel.o is linked to Haiku-specific driver. Haiku-specific code implement interfaces defined in nv.h and os-interface.h and fill nv_state_t structure.

viktor-prutyanov · 2024-07-01T20:07:31Z

viktor-prutyanov
Jul 1, 2024

Hi! Could anyone please tell what the abbreviations QMD, PCAS and SKED mean?

4 replies

mtijanic Jul 1, 2024
Maintainer

QMD is "Queue MetaData", but it's a very confusing name. It's a hardware defined structure that contains information to execute some compute work. We also sometimes just call it a "task". SKED is a HW engine that schedules QMDs (it's not an acronym).

PCAS is "Posted Compare And Swap" (I had to look it up). It's basically an equivalent of while (!__sync_bool_compare_and_swap(...)) { /*spin*/} operation, used to advance some pointers when allocating space for a QMD.

X547 Jul 1, 2024

Also what is the meaning of "WAR" found in various comments?

Example:

open-gpu-kernel-modules/src/nvidia/src/kernel/core/hal_mgr.c

Line 115 in e45d91d

// WAR: The majorrev of t234 shows 0xa on fmodel instead of 0x4

gauravjuvekar Jul 1, 2024
Maintainer

WAR = Workaround. Usually software workarounds for certain silicon bugs that won't be fixed in hardware, but we sometimes use the term in case of a temporary hacky fix that we plan to improve in future too.

aritger Jul 1, 2024
Maintainer

"WAR" is commonly used within NVIDIA as an abbreviation for "Work ARound".

lklake · 2024-07-18T09:02:39Z

lklake
Jul 18, 2024

@mtijanic Hi, I noticed that you mentioned a few years ago that you could provide sample input and output files of NVOC (link). I'm very curious about this, but I can't find it in any public place. Is it convenient for you to provide it at this moment? Thank you so much!

1 reply

mtijanic Aug 19, 2024
Maintainer

(sorry, was out of office)

We actually shipped a few NVOC files like RsClient or CeUtilsApi that roughly show the syntax and stuff. You can look at the g_xxx_nvoc.{h,c} in the corresponding source tree to see what the generated output looks like.

There's a bit of extra syntax sugar around to let us define different behavior per chip. For example,

    NV_STATUS kgspAllocBootArgs(OBJGPU *pGpu, KernelGsp *pKernelGsp) = hal
    {
        PF_KERNEL_ONLY : hal
        {
            dTURING...dADA                                  : _TU102;
            dHOPPER...                                      : _GH100;
            AMODEL | TEGRA | ...VOLTA                       : { NV_ASSERT_OR_RETURN_PRECOMP(0, NV_ERR_NOT_SUPPORTED); }
        }
        VF | PF_MONOLITHIC | UCODE                          : { NV_ASSERT_OR_RETURN_PRECOMP(0, NV_ERR_NOT_SUPPORTED); }
    };

you can search for kgspAllocBootArgs to see what it does, but tl;dr is that it checks which actual chip it is running on and then wires up a function pointer to either kgspAllocBootArgs_TU102 or kgspAllocBootArgs_GH100. Then when you call kgspAllocBootArgs_HAL() it dispatches to the correct one. All this info is already there in the g_kernel_gsp_nvoc.{h,c} files (especially the generated comments), so it's not hiding any juicy secret info or anything. We're making slow but steady progress in making these generated files more human readable too.

CO18326 · 2024-08-18T08:06:13Z

CO18326
Aug 18, 2024

can i have a detailed documentation of the source code... a module , file wise documentation

1 reply

mtijanic Aug 19, 2024
Maintainer

Me too!

(Sorry, we don't currently have any such doc that is suitable for public use. Maybe in the future someone can sit down and write something, but so far it has been pretty low on the priorities list)

X547 · 2024-08-19T09:57:14Z

X547
Aug 19, 2024

Do NVRM API have some engine load ratio, temperature and fan speed monitoring functionality? Information that is used for example in Windows Task Manager GPU load tab.

4 replies

mtijanic Aug 19, 2024
Maintainer

RM generally exposes it through this memory structure that is then mapped to userspace: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/sdk/nvidia/inc/class/cl00de.h#L244-L298

However, you're advised to not use that directly as it changes every driver release and your app would need to be tightly coupled to the driver. Instead, use NVML or NVAPI (or the X driver XNVCTRL). These are userspace libraries that are shipped with the driver and wrap the RMAPI in a way that is stable across versions.

X547 Aug 19, 2024

Proprietary components are not an option.

X547 Aug 19, 2024

Is it possible to get per-engine load?

mtijanic Aug 19, 2024
Maintainer

The various video engines load is available individually: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/sdk/nvidia/inc/class/cl00de.h#L136-L146

Keep in mind that this interface is pretty new and is evolving rather fast still, so keep an eye out on changes with every major version.

You can also get the utilization info with NV2080_CTRL_CMD_PERF_GET_GPUMON_PERFMON_UTIL_SAMPLES_V2 but then you are actually synchronously fetching the data from GSP and blocking GSP from doing other work, so I don't recommend doing this for monitoring, or you'll get stutter.

SeungsuBaek · 2024-08-26T02:13:49Z

SeungsuBaek
Aug 26, 2024

What is meaning of LTP? (with uTLB)

1 reply

gauravjuvekar Aug 26, 2024
Maintainer

L1, TEX, PE (Primitive Engine).

JL2210 · 2024-08-30T04:04:56Z

JL2210
Aug 30, 2024

A couple questions regarding module options:

It appears to me that NVReg_EnableResizableBar is disabled by default. Is this intentional? If I remember correctly there was a big buzz about enabling resizable BAR around the beginning of 2023 (along with a required firmware upgrade). But if I'm understanding right it requires an option that I can't find documented as being required anywhere but here.
Does RmSetPCIERelaxedOrdering do anything? I can't find it doing anything anywhere that it's referenced.
- Also, the docs seem to suggest EnablePCIERelaxedOrderingMode force enables relaxed ordering when it appears that it only sets it to enable normally:
  
  open-gpu-kernel-modules/src/nvidia/arch/nvalloc/unix/src/osinit.c
  
  Line 256 in ed4be64
  
  NV_REG_STR_RM_SET_PCIE_TLP_RELAXED_ORDERING_ENABLE);

I guess the million dollar question is whether these actually do anything or (for example) they're actually enabled by default on modern gpus (maybe inside the GSP?)

1 reply

aaronp24 Sep 24, 2024
Maintainer

My understanding is that there are two versions of this resizable BAR feature: the one that was in the news a while back was a motherboard firmware feature that would allocate a larger BAR1 at boot time. That requires both a GPU firmware update to set the maximum BAR size larger than 256 MB, and a system firmware update to actually use the larger size. The NVReg_EnableResizableBar option is useful in cases where the system firmware doesn't support resizable BAR and instead assigns a 256 MB BAR1. When the option is enabled, nvidia.ko will call into the core kernel to reallocate a larger BAR1 somewhere else, after the system has booted. This would normally break the framebuffer console if it happens on the primary GPU, but if you load nvidia-drm with modeset=1 fbdev=1 then it will allocate a new framebuffer console in the relocated BAR1.
The registry gets uploaded to GSP via the NV_RM_RPC_SET_REGISTRY call. I think this particular key is used there.

fengyuanyu1 · 2024-09-26T07:58:55Z

fengyuanyu1
Sep 26, 2024

How GPU fetch the push buffer? by DMA or others?
As far as I know, CPU side maintain a push buffer, its entry is the pointers to the GPU commands.
When I kick off the doorbell-register, GPU launch a DMA on PCIe to read the GPU command?

3 replies

aaronp24 Sep 26, 2024
Maintainer

Yes, that's correct.

fengyuanyu1 Oct 10, 2024

Hello, @aaronp24
I have another issue about the communication mechanisms about the CPU/GPU. How CPU knows GPU's status? I mean, how can CPU knows the GPU have finished its commands?

aaronp24 Oct 14, 2024
Maintainer

It depends on what the CPU is looking for. If it just wants to see if there's room for more data in the push buffer or the GPFIFO, it can read the Get and GPGet fields of the KeplerBControlGPFifo structure. The other option is to use a host semaphore release to track progress. For example, see the InsertProgressTracker function in nvidia-push.c. Note the comment at the top about whether or not the semaphore triggers a wait for idle (WFI): if that's disabled, the semaphore only indicates that the host method processor has gotten past that particular method. If you enable WFI then it indicates that the GPU has finished all of its prior methods.

There other types of progress tracking indicators available as well. See, for example, the NV*97_SET_REPORT_SEMAPHORE_* methods.

CarloRamponi · 2024-09-26T14:42:54Z

CarloRamponi
Sep 26, 2024

Can the driver intercept a kernel launch or other high-level operations such as context creation or module loading?
Perhaps also accessing user-mode structures such as CUContext, CUModule, or CUFunction?

2 replies

mtijanic Sep 27, 2024
Maintainer

Something like cuCtxCreate() involves dozens of separate calls into the kernel driver. You can certainly catch any one of them and poke around. You can also try to do this in userspace by overriding ioctl(), or by using strace, or something like bpftrace.

However, actual kernel launch and stuff is mostly handled directly from userspace to GPU via shared memory, and the kernel is not involved. To instrument that, I believe envytools has something based on userfaultfd that will log all the writes to this memory that the UMD makes.

mtijanic Sep 27, 2024
Maintainer

Here's a list of all the syscalls a CUDA app runs as part of cuInit() + cuCtxCreate(): https://gist.github.com/mtijanic/aabdfd00d9c73491c74638da826ed6d4 (gathered with bpftrace)

Garrybest · 2024-10-08T09:25:41Z

Garrybest
Oct 8, 2024

Can we schedule channels by NVC06F_CTRL_CMD_GPFIFO_SCHEDULE?

Env: Tesla T4 with open source driver 550.90.07.

Here is what I found:

Launched a CUDA kernel.
Disabled channels by setting 'bEnable=0' in NVC06F_CTRL_CMD_GPFIFO_SCHEDULE.
Now GPU utilization is decreasing to zero, I guess CUDA kernel is now waiting to be scheduled.
Try to enable channels by setting 'bEnable=1' in NVC06F_CTRL_CMD_GPFIFO_SCHEDULE, but nothing happened.

It seems that I can disable a channel but could not enable it again to let CUDA kernels go on. Am I wrong to use this command?

1 reply

Garrybest Oct 17, 2024

Hi @mtijanic @aaronp24 @gauravjuvekar, I'm tring to enable and disable the channel every 2s, but I found sometimes the cuda kernel will be stuck and the GPU utilization decreases to 0. Do you have any ideas about this?

SeungsuBaek · 2024-11-26T10:00:58Z

SeungsuBaek
Nov 26, 2024

Why does this code exist, and when will it be resolved? Earlier versions don't have that code, is it okay to use Access counter?

0 replies

wyfs4321 · 2024-12-02T13:24:33Z

wyfs4321
Dec 2, 2024

I'm a newbie of the open-gpu-kernel-modules, when i read the code, i got into trouble when understanding "pRmApi->Control()", e,g,
status = pRmApi->Control(pRmApi, pCtx->hClient, pCtx->hChannel, NVC56F_CTRL_CMD_GET_KMB, &getKmbParams, sizeof(getKmbParams));
i wonder how 'control' implement the function of "getKmbParams"? Can I think of ‘control’ as an interface in which a function about "getKmbParams" is invoked?

1 reply

mtijanic Dec 3, 2024
Maintainer

Hey there. Yes, your understanding is right. It's an object oriented design, and "controls" are just a method on a given object (which is identified with a {hClient,hObject} pair). The params are passed as basically (uint32_t cmd, void* params) and then the params pointer is cast to the appropriate structure depending on the cmd value. Here cmd is NVC56F_CTRL_CMD_GET_KMB and the param structure is NVC56F_CTRL_CMD_GET_KMB_PARAMS (the two usually follow this naming scheme, but there's exceptions).

Now, you can think of pRmApi->Control() as just a bunch of routing magic that resolves the objects in question and then calls the correct handler function. If you can tolerate black boxes, you can completely ignore all that and just jump straight into the handler function, which is: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/565.57.01/src/nvidia/src/kernel/gpu/fifo/kernel_channel.c#L4648-L4654

The easiest way to find this handler function is to search the all g_*_nvoc.c files NVC56F_CTRL_CMD_GET_KMB_PARAMS and find the export block that looks like: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/565.57.01/src/nvidia/generated/g_kernel_channel_nvoc.c#L570-L584

This will tell you what the function name is. If there is no such function found in the codebase, then that means the function is implemented on the GSP instead, and the pRmApi->Control() will invoke a blocking RPC to execute the function.

Technically, any control where the flags field in the export has this bit set

#define RMCTRL_FLAGS_ROUTE_TO_PHYSICAL                        0x000000040

will be RPC'd, even if the implementation is present, but that's very rare.

Hope this helps.

This comment was marked as off-topic.

Sign in to view

Ask questions about the codebase here #157

Replies: 54 comments · 127 replies

mtijanic May 13, 2022 Maintainer

mtijanic May 13, 2022 Maintainer

s66104444 May 13, 2022 Author

mtijanic May 13, 2022 Maintainer

mtijanic May 13, 2022 Maintainer

mtijanic May 14, 2022 Maintainer

mtijanic May 13, 2022 Maintainer

aritger May 13, 2022 Maintainer

mtijanic May 14, 2022 Maintainer

mtijanic May 14, 2022 Maintainer

mtijanic May 16, 2022 Maintainer

mtijanic Dec 17, 2024 Maintainer

mtijanic May 16, 2022 Maintainer

This comment was marked as off-topic.

aritger Nov 13, 2023 Maintainer

aritger Jun 27, 2024 Maintainer

mtijanic Jul 1, 2024 Maintainer

gauravjuvekar Jul 1, 2024 Maintainer

aritger Jul 1, 2024 Maintainer

mtijanic Aug 19, 2024 Maintainer

mtijanic Aug 19, 2024 Maintainer

mtijanic Aug 19, 2024 Maintainer

mtijanic Aug 19, 2024 Maintainer

gauravjuvekar Aug 26, 2024 Maintainer

aaronp24 Sep 24, 2024 Maintainer

aaronp24 Sep 26, 2024 Maintainer

aaronp24 Oct 14, 2024 Maintainer

mtijanic Sep 27, 2024 Maintainer

Replies: 54 comments 127 replies

mtijanic
May 13, 2022
Maintainer

mtijanic
May 13, 2022
Maintainer

s66104444
May 13, 2022
Author

mtijanic
May 13, 2022
Maintainer

mtijanic
May 13, 2022
Maintainer

mtijanic May 14, 2022
Maintainer

mtijanic
May 13, 2022
Maintainer

aritger
May 13, 2022
Maintainer

mtijanic May 14, 2022
Maintainer

mtijanic May 14, 2022
Maintainer

mtijanic May 16, 2022
Maintainer

mtijanic Dec 17, 2024
Maintainer

mtijanic May 16, 2022
Maintainer

aritger Nov 13, 2023
Maintainer

aritger Jun 27, 2024
Maintainer

mtijanic Jul 1, 2024
Maintainer

gauravjuvekar Jul 1, 2024
Maintainer

aritger Jul 1, 2024
Maintainer

mtijanic Aug 19, 2024
Maintainer

mtijanic Aug 19, 2024
Maintainer

mtijanic Aug 19, 2024
Maintainer

mtijanic Aug 19, 2024
Maintainer

gauravjuvekar Aug 26, 2024
Maintainer

aaronp24 Sep 24, 2024
Maintainer

aaronp24 Sep 26, 2024
Maintainer

aaronp24 Oct 14, 2024
Maintainer

mtijanic Sep 27, 2024
Maintainer