My GPU is gone

I have an optimus laptop, and after the update to KDE6 optimus-manager stopped working. I needed a second display, and all my display outputs are on the Nvdia GPU, so I needed to switch. I tried many different X11 configs, envycontrol then more X11 configs, but I couldn’t get it working right, it would only be the internal display or the external one, not both. after a few hours I gave up and tried optimus-manager again. This time I checked the error log and it was failing to load the nvidia module, I tried loading it manually but I got a “No such device” error, which is where the title of the post comes in. My GPU has disappeared from linux, it won’t show up in lspci, lshw, nvidia-smi, or anything else it should. The only reference to the thing in dmesg I can find are :


<span style="color:#323232;">[    0.216410] pci 0000:01:00.0: [10de:1ba1] type 00 class 0x030000
</span><span style="color:#323232;">[    0.216419] pci 0000:01:00.0: reg 0x10: [mem 0xde000000-0xdeffffff]
</span><span style="color:#323232;">[    0.216427] pci 0000:01:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
</span><span style="color:#323232;">[    0.216435] pci 0000:01:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
</span><span style="color:#323232;">[    0.216440] pci 0000:01:00.0: reg 0x24: [io  0xe000-0xe07f]
</span><span style="color:#323232;">[    0.216445] pci 0000:01:00.0: reg 0x30: [mem 0xdf000000-0xdf07ffff pref]
</span><span style="color:#323232;">[    0.216460] pci 0000:01:00.0: Enabling HDA controller
</span><span style="color:#323232;">[    0.257300] pci 0000:01:00.0: vgaarb: bridge control possible
</span><span style="color:#323232;">[    0.257300] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
</span><span style="color:#323232;">[    0.270521] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
</span>

and then nothing, it doesn’t even seem to try to load the nvidia module. I tried booting into windows and it shows up there fine, so the GPU didn’t randomly die.
As far as I can tell I’ve rolled back everything I did in my histfile until it stopped working, The only thing I could think is I upgraded my kernel to (6.7.9) from (6.6.10), could that have caused it? I also tried adding pcie_port_pm=off to the kernel params from the archwiki, but still nothing. I’m just at a loss here, anyone have any ideas?

EDIT: I’m using the nvidia-dkms package
EDIT2: one kernel downgrade later and it’s still not appearing, so thats not it.
EDIT3: fixed, see comments

RedWeasel,

What is the output of ‘lspci -k’?

nimmo,
@nimmo@lem.nimmog.uk avatar

I have the same issue on my desktop. I’d assumed it was something I’d done (it usually is) but I had to admit defeat and resort to switching to booting into a backup OS so that I could get on with all the tasks I need to get done but I’m assuming it was a problem with the Nvidia-dkms package that’ll be resolved in time as people have reported similar issues in the past.

thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

hmm, maybe downgrading nvidia-dkms might work? I’ll try that

nimmo,
@nimmo@lem.nimmog.uk avatar

I considered trying that then read about some kernel versions not being compatible with certain versions of the dkms package so decided to give up, go to bed and deal with the issue later. (This was Saturday night and sadly time hasn’t permitted me to start investigating again yet)

taaz, (edited )

I think I had this occur to me once and it was something really dumb but I can’t remember what.

@thomasdouwes just for the sake of trying everything, you could rebuild the dkms and initrams, then reboot:


<span style="color:#323232;">dkms autoinstall -F -a kernel-6.8.5-arch1 # change the kernel version according what you have now (read from uname -a)
</span><span style="color:#323232;">mkinitcpio -P
</span>

E: Exhaustive of what I would try

  • check if drivers and modprobe blacklist make sense (this one is broad and requires digging into arch wiki but the optimus laptop I had required blacklisting some drivers from early loading afaik)
  • fiddle with re-scans and power states in the sys bus PCI folders for the GPU
  • check that my mkinitcpio makes sense, additionally look for .pacnew (/etc/mkinitcpio.conf.pacnew) and see if the changes might affect the system
  • downgrade kernel - already tried
  • downgrade dkms packages
  • update BIOS and firmwares from windows
  • cold boot the laptop (shutdown, remove AC and battery, leave it cold for few seconds)
  • on windows, look into ROG Armoury/MSI Center for any kind of toggles that could have impact on the GPUs (iGPU/dGPU) stuff like power states, optimizations etc)
thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

I don’t seem to have an -F on my dkms? when I ran that it without, it didn’t rebuild all the DKMS modules for some reason, just bbswitch and evdi

thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

dkms status doesn’t even list half of my DKMS modules for some reason

taaz,

ah the -F might be wrong then actually, I was playing with custom kernels recently and my dkms is a mess, wouldn’t worry about that option

thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

Looks like you where right about the udev rules earlier, I ran a pacman command to find all untracked files in /usr and I found /usr/lib/udev/rules.d/50-remove-nvidia.rules was there. Contents:


<span style="color:#323232;"># Automatically generated by EnvyControl
</span><span style="color:#323232;">
</span><span style="color:#323232;"># Remove NVIDIA USB xHCI Host Controller devices, if present
</span><span style="color:#323232;">ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", ATTR{power/control}="auto", ATTR{remove}="1"
</span><span style="color:#323232;">
</span><span style="color:#323232;"># Remove NVIDIA USB Type-C UCSI devices, if present
</span><span style="color:#323232;">ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", ATTR{power/control}="auto", ATTR{remove}="1"
</span><span style="color:#323232;">
</span><span style="color:#323232;"># Remove NVIDIA Audio devices, if present
</span><span style="color:#323232;">ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", ATTR{power/control}="auto", ATTR{remove}="1"
</span><span style="color:#323232;">
</span><span style="color:#323232;"># Remove NVIDIA VGA/3D controller devices
</span><span style="color:#323232;">ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x03[0-9]*", ATTR{power/control}="auto", ATTR{remove}="1"
</span><span style="color:#323232;">
</span>

looks like EnvyControl left some extra files after uninstalling.
Personally, I think it’s pretty weird that it put runtime files in /usr/lib, if they where in /etc I would have found them quickly.
The GPU is back on the bus now and I can run optimus-manager to get my extra screen. Thank you for the help troubleshooting this issue.

lemmyreader,

Kernel downgrade does not help, but the card is visible with live boot. Good find!

taaz,

What model and version of laptop is this, also is the integrated GPU intel or amd one?

thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

It’s an MSI GE72MVR 7RG, what do mean version?
and it’s an integrated intel GPU.

taaz,

by version I meant the whole model name

It’s been some time since I’ve had an optimus laptop and I forgot how and what drivers have to be installed (afaik mesa should not be installed or blacklisted for the nvidia card ? something along these lines also kernel mode setting should be turned off?)

Can’t really help here but something is either causing your card to not initialize at all or not properly, the device missing in the sys bus pci dir is really weird though.

thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

yeah, they are a bit of a pain, but it’s a new one to me for the card to just disappear completely. It’s hard to do any troubleshooting when you can’t even access the card.

taaz,

what does dmesg show if you try to load nvidia manually ? (modprobe nvidia)

thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

<span style="color:#323232;">[ 1501.764754] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
</span><span style="color:#323232;">[ 1501.764761] NVRM: No NVIDIA GPU found.
</span><span style="color:#323232;">[ 1501.765791] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
</span>
taaz, (edited )

Well that is not good indeed. If you have a windows dual boot, might be worth checking if wins didn’t do their “hibernation” shutdown and left the gpu in some weird state (I don’t remember how to force windows to just shutdown instead of the hibernation/hybrid one).

snoo.habedieeh.re/r/archlinux/…/jjsng0x/?context=…

E: Disabling any kind of fastboot in bios might also be worth trying.

thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

It disappeared without me booting into windows, I booted windows to test after it was gone. But I did just try to force a hard shutdown on windows and disabled fastboot, but it’s still not appearing.

taaz,

Could you show the output of ls -lah /sys/bus/pci/devices/0000:01:00.0/

thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

ls: cannot access ‘/sys/bus/pci/devices/0000:01:00.0’: No such file or directory
I also tried booting an archiso and the GPU appears there, there must be something wrong with my install.

taaz,

Good find, I would look into udev and modprobe (blacklists and stuff like that)

thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

I had a look at /etc/udev, /etc/modprobe.d and /etc/modules-load.d, and don’t see anything related to nvidia. Are there any more udev or blacklist folders to look at?

taaz,

Don’t remember other dirs, maybe some /usr/lib/… or /usr/share/… ?

Could you try echo 1 > /sys/bus/pci/rescan and watch dmesg

thomasdouwes,
@thomasdouwes@sopuli.xyz avatar

interesting, that did show the nvidia card in dmesg, still not in lspci though


<span style="color:#323232;">[ 1110.598286] pci 0000:01:00.0: [10de:1ba1] type 00 class 0x030000
</span><span style="color:#323232;">[ 1110.598301] pci 0000:01:00.0: reg 0x10: [mem 0xde000000-0xdeffffff]
</span><span style="color:#323232;">[ 1110.598310] pci 0000:01:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
</span><span style="color:#323232;">[ 1110.598318] pci 0000:01:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
</span><span style="color:#323232;">[ 1110.598324] pci 0000:01:00.0: reg 0x24: [io  0xe000-0xe07f]
</span><span style="color:#323232;">[ 1110.598330] pci 0000:01:00.0: reg 0x30: [mem 0xdf000000-0xdf07ffff pref]
</span><span style="color:#323232;">[ 1110.599069] pci 0000:01:00.0: vgaarb: bridge control possible
</span><span style="color:#323232;">[ 1110.599073] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
</span><span style="color:#323232;">[ 1110.599078] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
</span><span style="color:#323232;">[ 1110.599125] pci 0000:01:00.1: [10de:10f0] type 00 class 0x040300
</span><span style="color:#323232;">[ 1110.599135] pci 0000:01:00.1: reg 0x10: [mem 0xdf080000-0xdf083fff]
</span><span style="color:#323232;">[ 1110.599327] pci 0000:01:00.0: BAR 1: assigned [mem 0xc0000000-0xcfffffff 64bit pref]
</span><span style="color:#323232;">[ 1110.599335] pci 0000:01:00.0: BAR 3: assigned [mem 0xd0000000-0xd1ffffff 64bit pref]
</span><span style="color:#323232;">[ 1110.599341] pci 0000:01:00.0: BAR 0: assigned [mem 0xde000000-0xdeffffff]
</span><span style="color:#323232;">[ 1110.599344] pci 0000:01:00.0: BAR 6: assigned [mem 0xdf000000-0xdf07ffff pref]
</span><span style="color:#323232;">[ 1110.599347] pci 0000:01:00.1: BAR 0: assigned [mem 0xdf080000-0xdf083fff]
</span><span style="color:#323232;">[ 1110.599349] pci 0000:01:00.0: BAR 5: assigned [io  0xe000-0xe07f]
</span><span style="color:#323232;">[ 1110.599384] pci 0000:01:00.1: extending delay after power-on from D3hot to 20 msec
</span><span style="color:#323232;">[ 1110.599418] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
</span><span style="color:#323232;">[ 1110.599509] snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
</span><span style="color:#323232;">[ 1110.599624] snd_hda_intel 0000:01:00.1: Disabling MSI
</span><span style="color:#323232;">[ 1110.599630] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
</span><span style="color:#323232;">[ 1110.603829] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=io+mem:owns=io+mem
</span><span style="color:#323232;">[ 1110.628268] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input18
</span><span style="color:#323232;">[ 1110.628341] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input19
</span><span style="color:#323232;">[ 1110.628403] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input20
</span><span style="color:#323232;">[ 1110.628464] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input21
</span>
  • All
  • Subscribed
  • Moderated
  • Favorites
  • archlinux@lemmy.ml
  • DreamBathrooms
  • thenastyranch
  • mdbf
  • vwfavf
  • Youngstown
  • slotface
  • hgfsjryuu7
  • Durango
  • rosin
  • kavyap
  • osvaldo12
  • PowerRangers
  • InstantRegret
  • magazineikmin
  • normalnudes
  • khanakhh
  • GTA5RPClips
  • ethstaker
  • cubers
  • ngwrru68w68
  • tacticalgear
  • everett
  • tester
  • Leos
  • cisconetworking
  • modclub
  • anitta
  • provamag3
  • All magazines