Proxmox AMD GPU Passthrough

I’ve been using a Proxmox home server for quite some time now without many problems. Recently i got an AMD Navi 10 RX 5700 XT and tried to pass it through to a windows VM. I mainly followed the official Proxmox guide but got it running by using some other tutorials too. For now, it works once after i reboot the host. Then its no problem to start the VM, but after a restart the VM doesnt start no more, showing this error: swtpm_setup: Not overwriting existing state file. kvm: …/hw/pci/pci.c:1637: pci_irq_handler: Assertion 0 <= irq_num && irq_num < PCI_NUM_PINS’ failed. stopping swtpm instance (pid 98348) due to QEMU startup error TASK ERROR: start failed: QEMU exited with code -1` I tried fixing it using this but it didnt change much. https://lemmy.world/pictrs/image/0a1d3484-d913-4a4a-a310-6e3b8fa6378a.png

EDIT: link was not shown

InEnduringGrowStrong,
@InEnduringGrowStrong@sh.itjust.works avatar

Maybe this?
github.com/gnif/vendor-reset
Although I’ve been passing through a vega64 without needing this.

server_paul,

Yeah, i tried that - the link was just not shown in the original post That didnt really fix it

InEnduringGrowStrong,
@InEnduringGrowStrong@sh.itjust.works avatar

Try journalctl to get more details from when it fails?

server_paul,

This is the output from journalctl, since stopping and rebooting the VM: Main error seems to occur at 16:41:43 `Dec 19 16:40:45 pve pvedaemon[1590]: end task UPID:pve:00030675:000E7952:6581B96F:vncshell::root@pam: OK

Dec 19 16:40:47 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting

Dec 19 16:41:03 pve pvedaemon[1590]: starting task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:

Dec 19 16:41:03 pve pvedaemon[198894]: start VM 195: UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:

Dec 19 16:41:06 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting

Dec 19 16:41:40 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up

Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D0 to D3hot, device inaccessible

Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible

Dec 19 16:41:41 pve systemd[1]: 195.scope: Deactivated successfully.

Dec 19 16:41:41 pve systemd[1]: 195.scope: Consumed 54min 2.778s CPU time.

Dec 19 16:41:41 pve systemd[1]: Started 195.scope.

Dec 19 16:41:41 pve kernel: tap195i0: entered promiscuous mode

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered disabled state

Dec 19 16:41:41 pve kernel: fwpr195p0: entered allmulticast mode

Dec 19 16:41:41 pve kernel: fwpr195p0: entered promiscuous mode

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered forwarding state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered disabled state

Dec 19 16:41:41 pve kernel: fwln195i0: entered allmulticast mode

Dec 19 16:41:41 pve kernel: fwln195i0: entered promiscuous mode

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered forwarding state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state

Dec 19 16:41:41 pve kernel: tap195i0: entered allmulticast mode

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered forwarding state

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:44 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s

Dec 19 16:41:44 pve pvedaemon[1592]: VM 195 qmp command failed - VM 195 not running

Dec 19 16:41:45 pve kernel: pcieport 0000:02:00.0: retraining failed

Dec 19 16:41:46 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s

Dec 19 16:41:47 pve kernel: pcieport 0000:02:00.0: retraining failed

Dec 19 16:41:47 pve kernel: vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; waiting

Dec 19 16:41:48 pve kernel: vfio-pci 0000:03:00.0: not ready 2047ms after bus reset; waiting

Dec 19 16:41:50 pve kernel: vfio-pci 0000:03:00.0: not ready 4095ms after bus reset; waiting

Dec 19 16:41:54 pve kernel: vfio-pci 0000:03:00.0: not ready 8191ms after bus reset; waiting

Dec 19 16:42:03 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting

Dec 19 16:42:21 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting

Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up

Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state

Dec 19 16:42:56 pve kernel: tap195i0 (unregistering): left allmulticast mode

Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state

Dec 19 16:42:56 pve pvedaemon[199553]: stopping swtpm instance (pid 199561) due to QEMU startup error

Dec 19 16:42:56 pve pvedaemon[198894]: start failed: QEMU exited with code 1

Dec 19 16:42:56 pve pvedaemon[1590]: end task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: start failed: QEMU exit>

Dec 19 16:42:56 pve systemd[1]: 195.scope: Deactivated successfully.

Dec 19 16:42:56 pve systemd[1]: 195.scope: Consumed 1.736s CPU time.`

server_paul,

dmesg also reported vendor_reset: module verification failed: signature and/or required key missing - tainting kernelHowever, according to https://github.com/gnif/vendor-reset/issues/46#issuecomment-983087796 this error is not as important…

server_paul,

To everyone else encountering this error, I finally fixed it this way: This forum entry sent me here, which then helped me resolve the issue. Huge thanks to you, InEnduringGrowStrong, for pushing me in the right direction.

InEnduringGrowStrong,
@InEnduringGrowStrong@sh.itjust.works avatar

Ah nice you got it working.
Once it works it’s great.
I’ve been running mine for a while now, but purposefully avoided Kernel upgrades so far.

server_paul,

Haha, I already started worrying about that :) But you‘re right, its great.

InEnduringGrowStrong,
@InEnduringGrowStrong@sh.itjust.works avatar

Formatted with a code block so it’s more readable:


<span style="color:#323232;">16:41:43 `Dec 19 16:40:45 pve pvedaemon[1590]: end task UPID:pve:00030675:000E7952:6581B96F:vncshell::root@pam: OK
</span><span style="color:#323232;">Dec 19 16:40:47 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
</span><span style="color:#323232;">Dec 19 16:41:03 pve pvedaemon[1590]: starting task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:
</span><span style="color:#323232;">Dec 19 16:41:03 pve pvedaemon[198894]: start VM 195: UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:
</span><span style="color:#323232;">Dec 19 16:41:06 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
</span><span style="color:#323232;">Dec 19 16:41:40 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D0 to D3hot, device inaccessible
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible
</span><span style="color:#323232;">Dec 19 16:41:41 pve systemd[1]: 195.scope: Deactivated successfully.
</span><span style="color:#323232;">Dec 19 16:41:41 pve systemd[1]: 195.scope: Consumed 54min 2.778s CPU time.
</span><span style="color:#323232;">Dec 19 16:41:41 pve systemd[1]: Started 195.scope.
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: tap195i0: entered promiscuous mode
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered disabled state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwpr195p0: entered allmulticast mode
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwpr195p0: entered promiscuous mode
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered forwarding state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered disabled state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwln195i0: entered allmulticast mode
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwln195i0: entered promiscuous mode
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered forwarding state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: tap195i0: entered allmulticast mode
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state
</span><span style="color:#323232;">Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered forwarding state
</span><span style="color:#323232;">Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
</span><span style="color:#323232;">Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
</span><span style="color:#323232;">Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
</span><span style="color:#323232;">Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
</span><span style="color:#323232;">Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
</span><span style="color:#323232;">Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
</span><span style="color:#323232;">Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
</span><span style="color:#323232;">Dec 19 16:41:44 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s
</span><span style="color:#323232;">Dec 19 16:41:44 pve pvedaemon[1592]: VM 195 qmp command failed - VM 195 not running
</span><span style="color:#323232;">Dec 19 16:41:45 pve kernel: pcieport 0000:02:00.0: retraining failed
</span><span style="color:#323232;">Dec 19 16:41:46 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s
</span><span style="color:#323232;">Dec 19 16:41:47 pve kernel: pcieport 0000:02:00.0: retraining failed
</span><span style="color:#323232;">Dec 19 16:41:47 pve kernel: vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; waiting
</span><span style="color:#323232;">Dec 19 16:41:48 pve kernel: vfio-pci 0000:03:00.0: not ready 2047ms after bus reset; waiting
</span><span style="color:#323232;">Dec 19 16:41:50 pve kernel: vfio-pci 0000:03:00.0: not ready 4095ms after bus reset; waiting
</span><span style="color:#323232;">Dec 19 16:41:54 pve kernel: vfio-pci 0000:03:00.0: not ready 8191ms after bus reset; waiting
</span><span style="color:#323232;">Dec 19 16:42:03 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
</span><span style="color:#323232;">Dec 19 16:42:21 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
</span><span style="color:#323232;">Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
</span><span style="color:#323232;">Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
</span><span style="color:#323232;">Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
</span><span style="color:#323232;">Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
</span><span style="color:#323232;">Dec 19 16:42:56 pve kernel: tap195i0 (unregistering): left allmulticast mode
</span><span style="color:#323232;">Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
</span><span style="color:#323232;">Dec 19 16:42:56 pve pvedaemon[199553]: stopping swtpm instance (pid 199561) due to QEMU startup error
</span><span style="color:#323232;">Dec 19 16:42:56 pve pvedaemon[198894]: start failed: QEMU exited with code 1
</span><span style="color:#323232;">Dec 19 16:42:56 pve pvedaemon[1590]: end task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: start failed: QEMU exit>
</span><span style="color:#323232;">Dec 19 16:42:56 pve systemd[1]: 195.scope: Deactivated successfully.
</span><span style="color:#323232;">Dec 19 16:42:56 pve systemd[1]: 195.scope: Consumed 1.736s CPU time.
</span>
InEnduringGrowStrong,
@InEnduringGrowStrong@sh.itjust.works avatar

It does seem a lot like the reset bug, but then you already tried that. :/ Kernel module aren’t as easy to install and if you’re missing the required flags it might just do nothing.


<span style="color:#323232;">grep -E '(CONFIG_FTRACE|CONFIG_KPROBES|CONFIG_PCI_QUIRKS|CONFIG_KALLSYMS|CONFIG_KALLSYMS_ALL|CONFIG_FUNCTION_TRACER)b' /boot/config-`uname -r`  
</span>

Should show the 6 flags =y

Or maybe some variation of manual reset…
…proxmox.com/…/issues-with-intel-arc-a770m-gpu-pa…

server_paul,

Just fyi, the 6 y-flags were shown

server_paul,

It was inteded to be a code block, but that way it was just a bunch of text without newlines somehow

  • All
  • Subscribed
  • Moderated
  • Favorites
  • proxmox@lemmy.world
  • ngwrru68w68
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • InstantRegret
  • GTA5RPClips
  • Youngstown
  • everett
  • slotface
  • rosin
  • osvaldo12
  • mdbf
  • kavyap
  • cubers
  • megavids
  • modclub
  • normalnudes
  • tester
  • khanakhh
  • Durango
  • ethstaker
  • tacticalgear
  • Leos
  • provamag3
  • anitta
  • cisconetworking
  • JUstTest
  • lostlight
  • All magazines