lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <b0c22deb-c0fa-3343-33cf-fd9a77d7db99@absolutedigital.net>
Date: Tue, 3 Feb 2026 17:27:00 -0500 (EST)
From: Cal Peake <cp@...olutedigital.net>
To: Kernel Mailing List <linux-kernel@...r.kernel.org>
cc: Mario Limonciello <superm1@...nel.org>,
        Kent Russell <kent.russell@....com>,
        Alex Deucher <alexander.deucher@....com>
Subject: amdgpu driver rebinding broken by "drm/amd: Clean up kfd node on
 surprise disconnect"

Hi,

The recent commit 28695ca09d32: "drm/amd: Clean up kfd node on surprise 
disconnect," has broken something with my VMs utilizing PCI passthrough.
Before launching the VM, I unbind a secondary Radeon GPU from the amdgpu 
driver and then to the vfio-pci driver.

Previously, everything would just work and I'd get the following kernel 
output:

  amdgpu 0000:14:00.0: amdgpu: amdgpu: finishing device.
  [drm] amdgpu: ttm finalized
  vfio-pci 0000:14:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=io+mem:owns=none
  vfio-pci 0000:14:00.1: enabling device (0000 -> 0002)
  vfio-pci 0000:14:00.3: enabling device (0000 -> 0002)

However, now when doing the rebinding: the host display (a different 
Radeon GPU) will freeze up and after a short time, the system will reset 
(thanks to a watchdog I believe) and I get this in the kernel log:

  amdgpu 0000:14:00.0: amdgpu: amdgpu: finishing device.
  vfio-pci 0000:14:00.1: Unable to change power state from D3hot to D0, device inaccessible
  vfio-pci 0000:14:00.3: Unable to change power state from D3hot to D0, device inaccessible
  vfio-pci 0000:14:00.2: Unable to change power state from D3hot to D0, device inaccessible


Indeed, backing out the commit from kernels 6.12.67, 6.12.68, and 6.18.8 
gets things back to working.

Please let me know if you have any ideas or if there is anymore debugging 
info I can provide.

Thanks,

-- 
Cal Peake


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ