linux-kernel - Re: [REGRESSION] Too-low frequency limit for AMD GPU PCI-passed-through to Windows VM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87mthkkqr4.fsf@turner.link>
Date:   Sun, 20 Mar 2022 21:26:51 -0400
From:   James Turner <linuxkernel.foss@...rc-none.turner.link>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     Alex Deucher <alexdeucher@...il.com>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        Paul Menzel <pmenzel@...gen.mpg.de>,
        Xinhui Pan <Xinhui.Pan@....com>, regressions@...ts.linux.dev,
        kvm@...r.kernel.org, Greg KH <gregkh@...uxfoundation.org>,
        Lijo Lazar <lijo.lazar@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>,
        Alexander Deucher <Alexander.Deucher@....com>,
        Christian König <Christian.Koenig@....com>
Subject: Re: [REGRESSION] Too-low frequency limit for AMD GPU
 PCI-passed-through to Windows VM

>>> Right, interference from host drivers and pre-boot environments is
>>> always a concern with GPU assignment in particular. AMD GPUs have a
>>> long history of poor behavior relative to things like PCI secondary
>>> bus resets which we use to try to get devices to clean, reusable
>>> states for assignment. Here a device is being bound to a host driver
>>> that initiates some sort of power control, unbound from that driver
>>> and exposed to new drivers far beyond the scope of the kernel's
>>> regression policy. Perhaps it's possible to undo such power control
>>> when unbinding the device, but it's not necessarily a given that
>>> such a thing is possible for this device without a cold reset.
>>>
>>> IMO, it's not fair to restrict the kernel from such advancements. If
>>> the use case is within a VM, don't bind host drivers. It's difficult
>>> to make promises when dynamically switching between host and
>>> userspace drivers for devices that don't have functional reset
>>> mechanisms.

To clarify, the GPU is never bound to the `amdgpu` driver on the host.
I'm binding it to `vfio-pci` on the host at boot, specifically to avoid
issues with dynamic rebinding. To do this, I'm passing
`vfio-pci.ids=1002:6981,1002:aae0` on the kernel command line, and I've
confirmed that this option is working:

% lspci -nnk -d 1002:6981
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Lexa XT [Radeon PRO WX 3200] [1002:6981]
	Subsystem: Dell Device [1028:0926]
	Kernel driver in use: vfio-pci
	Kernel modules: amdgpu

% lspci -nnk -d 1002:aae0
01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] [1002:aae0]
	Subsystem: Dell Device [1028:0926]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel

Starting with
f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)")
this is insufficient for the GPU to work properly in the VM, since the
`amdgpu` module is calling global ACPI methods which affect the GPU even
though it's not bound to the `amdgpu` driver.

>> Additionally, operating the isolated device in a VM on a constrained
>> environment like a laptop may have other adverse side effects.  The
>> driver in the guest would ideally know that this is a laptop and needs
>> to properly interact with APCI to handle power management on the
>> device.  If that is not the case, the driver in the guest may end up
>> running the device out of spec with what the platform supports.  It's
>> also likely to break suspend and resume, especially on systems which
>> use S0ix since the firmware will generally only turn off certain power
>> rails if all of the devices on the rails have been put into the proper
>> state.  That state may vary depending on the platform requirements.

Fwiw, the guest Windows AMD driver can tell that it's a mobile GPU, and
as a result, the driver GUI locks various performance parameters to the
defaults. The cooling system and power supply seem to work without
issues. As the load on the GPU increases, the fan speed increases. The
GPU stays below the critical temperature with plenty of margin, even at
100% load. The voltage reported by the GPU adjusts with the load, and I
haven't experienced any glitches which would suggest that the GPU is not
getting enough power or something. I haven't tried suspend/resume.

What are the differences between a laptop and desktop, aside from the
size of the cooling system? Could the issue reported here affect desktop
systems, too?

As far as what to do for this issue: Personally, I don't mind
blacklisting `amdgpu` on my machine. My primary concerns are:

1. Other users may experience this issue and have trouble figuring out
   what's happening, or they may not even realize that they're
   experiencing significantly-lower-than-expected performance.

2. It's possible that this issue affects some machines which use an AMD
   GPU for the host and a second AMD GPU for the guest. For those
   machines, blacklisting `amdgpu` would not be an option, since that
   would disable the AMD GPU for the host.

I've tried to help with concern 1 by mentioning this issue on the Arch
Linux Wiki [1]. Another thing that would help is to print a warning
message to the kernel ring buffer when an AMD GPU is bound to `vfio-pci`
and the `amdgpu` module is loaded. (It would say something like,
"Although the <GPU_NAME> device is bound to `vfio-pci`, loading the
`amdgpu` module may still affect it via ACPI. Consider blacklisting
`amdgpu` if the GPU does not behave as expected.")

I'm not sure if there's any way to address concern 2, aside from fixing
the firmware / Windows AMD driver.

I thought of one more thing I could test -- I could try a Linux guest
instead of a Windows guest to determine if the issue is due to the
firmware or the guest Windows AMD driver. Would that be helpful?

[1]: https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#Too-low_frequency_limit_for_AMD_GPU_passed-through_to_virtual_machine

James