Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No devices detected on nvidia nano #13

Open
ericvh opened this issue Jul 28, 2024 · 1 comment
Open

No devices detected on nvidia nano #13

ericvh opened this issue Jul 28, 2024 · 1 comment

Comments

@ericvh
Copy link
Member

ericvh commented Jul 28, 2024

Nvidia nanos running their latest jetpack 4 (v4.6.5-b29) (latest that runs on nanos I believe) and Nvidia-container-runtime (w/docker) aren't reporting any devices (including embedded GPU).

Nvidia Jetson NX appear to work fine, as do other nvidia platforms.

@ericvh
Copy link
Member Author

ericvh commented Jul 28, 2024

Here's a dump from the nano (some devices come through, namely snd, but no gpu or anything else):

`
erivan01@FV7GG9FTHL ~ % kubectl describe node atg-nano01
Name: atg-nano01
Roles:
Labels: beta.kubernetes.io/arch=arm64
beta.kubernetes.io/instance-type=k3s
beta.kubernetes.io/os=linux
feature.node.kubernetes.io/cpu-cpuid.AES=true
feature.node.kubernetes.io/cpu-cpuid.ASIMD=true
feature.node.kubernetes.io/cpu-cpuid.CRC32=true
feature.node.kubernetes.io/cpu-cpuid.EVTSTRM=true
feature.node.kubernetes.io/cpu-cpuid.FP=true
feature.node.kubernetes.io/cpu-cpuid.PMULL=true
feature.node.kubernetes.io/cpu-cpuid.SHA1=true
feature.node.kubernetes.io/cpu-cpuid.SHA2=true
feature.node.kubernetes.io/cpu-hardware_multithreading=false
feature.node.kubernetes.io/cpu-model.family=0
feature.node.kubernetes.io/cpu-model.id=0
feature.node.kubernetes.io/cpu-model.vendor_id=VendorUnknown
feature.node.kubernetes.io/kernel-config.NO_HZ=true
feature.node.kubernetes.io/kernel-config.NO_HZ_IDLE=true
feature.node.kubernetes.io/kernel-config.PREEMPT=true
feature.node.kubernetes.io/kernel-version.full=4.9.337-tegra
feature.node.kubernetes.io/kernel-version.major=4
feature.node.kubernetes.io/kernel-version.minor=9
feature.node.kubernetes.io/kernel-version.revision=337
feature.node.kubernetes.io/pci-10ec.present=true
feature.node.kubernetes.io/storage-nonrotationaldisk=true
feature.node.kubernetes.io/system-os_release.ID=ubuntu
feature.node.kubernetes.io/system-os_release.VERSION_ID=18.04
feature.node.kubernetes.io/system-os_release.VERSION_ID.major=18
feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=04
kubernetes.io/arch=arm64
kubernetes.io/hostname=atg-nano01
kubernetes.io/os=linux
node.kubernetes.io/instance-type=k3s
nvidia.com/gpu.present=false
nvidia.com/type=nano
smarter.device-manager=enabled
vendor=nvidia
Annotations: alpha.kubernetes.io/provided-node-ip: 192.168.2.179
flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"6a:d5:8b:3f:a9:15"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.2.179
k3s.io/hostname: atg-nano01
k3s.io/internal-ip: 192.168.2.179
k3s.io/node-args: ["agent","--docker","--node-label","vendor=nvidia","--node-label","nvidia.com/type=nano"]
k3s.io/node-config-hash: V3CA2BFCYDI52P4BV54AWCCCCIKESSUVDBA2G33JR7RD3XQZVMIA====
k3s.io/node-env:
{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/85a4e040ca2cf1dcacd91bfa2c900a49607432d0ffeefc243281d0c79322108f","K3S_TOKEN":"********","K3S_U...
management.cattle.io/pod-limits: {"cpu":"400m","memory":"115Mi"}
management.cattle.io/pod-requests: {"cpu":"20m","memory":"39Mi","pods":"3"}
nfd.node.kubernetes.io/feature-labels:
cpu-cpuid.AES,cpu-cpuid.ASIMD,cpu-cpuid.CRC32,cpu-cpuid.EVTSTRM,cpu-cpuid.FP,cpu-cpuid.PMULL,cpu-cpuid.SHA1,cpu-cpuid.SHA2,cpu-hardware_mu...
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sat, 27 Jul 2024 18:29:38 -0500
Taints:
Unschedulable: false
Lease:
HolderIdentity: atg-nano01
AcquireTime:
RenewTime: Sun, 28 Jul 2024 09:23:43 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure False Sun, 28 Jul 2024 09:23:17 -0500 Sat, 27 Jul 2024 18:39:01 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 28 Jul 2024 09:23:17 -0500 Sat, 27 Jul 2024 18:39:01 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sun, 28 Jul 2024 09:23:17 -0500 Sat, 27 Jul 2024 18:39:01 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 28 Jul 2024 09:23:17 -0500 Sat, 27 Jul 2024 18:39:01 -0500 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.2.179
Hostname: atg-nano01
Capacity:
cpu: 4
ephemeral-storage: 30586600Ki
hugepages-2Mi: 0
memory: 4057740Ki
pods: 110
smarter-devices/snd: 20
Allocatable:
cpu: 4
ephemeral-storage: 29754644457
hugepages-2Mi: 0
memory: 4057740Ki
pods: 110
smarter-devices/snd: 20
System Info:
Machine ID: a3d9197b765643568af09eb2bd3e5ce7
System UUID: a3d9197b765643568af09eb2bd3e5ce7
Boot ID: bd94ed70-117b-41ae-a28a-4e5584ca4b81
Kernel Version: 4.9.337-tegra
OS Image: Ubuntu 18.04.6 LTS
Operating System: linux
Architecture: arm64
Container Runtime Version: docker://20.10.21
Kubelet Version: v1.29.6+k3s2
Kube-Proxy Version: v1.29.6+k3s2
PodCIDR: 10.42.7.0/24
PodCIDRs: 10.42.7.0/24
ProviderID: k3s://atg-nano01
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age


default smarter-device-manager-72l5h 10m (0%) 200m (5%) 15Mi (0%) 15Mi (0%) 16m
kube-system svclb-traefik-619a0f16-rxlqh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14h
lens-metrics node-exporter-p9jf7 10m (0%) 200m (5%) 24Mi (0%) 100Mi (2%) 13h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits


cpu 20m (0%) 400m (10%)
memory 39Mi (0%) 115Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
smarter-devices/snd 0 0
Events:
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant