lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADnq5_P5RAJxKWCQBmJae8eWjJ5_wPG01uJYOpXMGsieWqUDvw@mail.gmail.com>
Date:   Mon, 24 Jan 2022 12:04:18 -0500
From:   Alex Deucher <alexdeucher@...il.com>
To:     James Turner <linuxkernel.foss@...rc-none.turner.link>
Cc:     "Lazar, Lijo" <Lijo.Lazar@....com>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        Greg KH <gregkh@...uxfoundation.org>,
        "Pan, Xinhui" <Xinhui.Pan@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        "amd-gfx@...ts.freedesktop.org" <amd-gfx@...ts.freedesktop.org>,
        Alex Williamson <alex.williamson@...hat.com>,
        "Koenig, Christian" <Christian.Koenig@....com>
Subject: Re: [REGRESSION] Too-low frequency limit for AMD GPU
 PCI-passed-through to Windows VM

On Sat, Jan 22, 2022 at 4:38 PM James Turner
<linuxkernel.foss@...rc-none.turner.link> wrote:
>
> Hi Lijo,
>
> > Could you provide the pp_dpm_* values in sysfs with and without the
> > patch? Also, could you try forcing PCIE to gen3 (through pp_dpm_pcie)
> > if it's not in gen3 when the issue happens?
>
> AFAICT, I can't access those values while the AMD GPU PCI devices are
> bound to `vfio-pci`. However, I can at least access the link speed and
> width elsewhere in sysfs. So, I gathered what information I could for
> two different cases:
>
> - With the PCI devices bound to `vfio-pci`. With this configuration, I
>   can start the VM, but the `pp_dpm_*` values are not available since
>   the devices are bound to `vfio-pci` instead of `amdgpu`.
>
> - Without the PCI devices bound to `vfio-pci` (i.e. after removing the
>   `vfio-pci.ids=...` kernel command line argument). With this
>   configuration, I can access the `pp_dpm_*` values, since the PCI
>   devices are bound to `amdgpu`. However, I cannot use the VM. If I try
>   to start the VM, the display (both the external monitors attached to
>   the AMD GPU and the built-in laptop display attached to the Intel
>   iGPU) completely freezes.
>
> The output shown below was identical for both the good commit:
> f1688bd69ec4 ("drm/amd/amdgpu:save psp ring wptr to avoid attack")
> and the commit which introduced the issue:
> f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)")
>
> Note that the PCI link speed increased to 8.0 GT/s when the GPU was
> under heavy load for both versions, but the clock speeds of the GPU were
> different under load. (For the good commit, it was 1295 MHz; for the bad
> commit, it was 501 MHz.)
>

Are the ATIF and ATCS ACPI methods available in the guest VM?  They
are required for this platform to work correctly from a power
standpoint.  One thing that f9b7f3703ff9 did was to get those ACPI
methods executed on certain platforms where they had not been
previously due to a bug in the original implementation.  If the
windows driver doesn't interact with them, it could cause performance
issues.  It may have worked by accident before because the ACPI
interfaces may not have been called, leading the windows driver to
believe this was a standalone dGPU rather than one integrated into a
power/thermal limited platform.

Alex


>
> # With the PCI devices bound to `vfio-pci`
>
> ## Before starting the VM
>
> % ls /sys/module/amdgpu/drivers/pci:amdgpu
> module  bind  new_id  remove_id  uevent  unbind
>
> % find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \;
> /sys/bus/pci/devices/0000:01:00.0/current_link_width
> 8
> /sys/bus/pci/devices/0000:01:00.0/current_link_speed
> 8.0 GT/s PCIe
>
> ## While running the VM, before placing the AMD GPU under heavy load
>
> % find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \;
> /sys/bus/pci/devices/0000:01:00.0/current_link_width
> 8
> /sys/bus/pci/devices/0000:01:00.0/current_link_speed
> 2.5 GT/s PCIe
>
> ## While running the VM, with the AMD GPU under heavy load
>
> % find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \;
> /sys/bus/pci/devices/0000:01:00.0/current_link_width
> 8
> /sys/bus/pci/devices/0000:01:00.0/current_link_speed
> 8.0 GT/s PCIe
>
> ## While running the VM, after stopping the heavy load on the AMD GPU
>
> % find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \;
> /sys/bus/pci/devices/0000:01:00.0/current_link_width
> 8
> /sys/bus/pci/devices/0000:01:00.0/current_link_speed
> 2.5 GT/s PCIe
>
> ## After stopping the VM
>
> % find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \;
> /sys/bus/pci/devices/0000:01:00.0/current_link_width
> 8
> /sys/bus/pci/devices/0000:01:00.0/current_link_speed
> 2.5 GT/s PCIe
>
>
> # Without the PCI devices bound to `vfio-pci`
>
> % ls /sys/module/amdgpu/drivers/pci:amdgpu
> 0000:01:00.0  module  bind  new_id  remove_id  uevent  unbind
>
> % for f in /sys/module/amdgpu/drivers/pci:amdgpu/*/pp_dpm_*; do echo "$f"; cat "$f"; echo; done
> /sys/module/amdgpu/drivers/pci:amdgpu/0000:01:00.0/pp_dpm_mclk
> 0: 300Mhz
> 1: 625Mhz
> 2: 1500Mhz *
>
> /sys/module/amdgpu/drivers/pci:amdgpu/0000:01:00.0/pp_dpm_pcie
> 0: 2.5GT/s, x8
> 1: 8.0GT/s, x16 *
>
> /sys/module/amdgpu/drivers/pci:amdgpu/0000:01:00.0/pp_dpm_sclk
> 0: 214Mhz
> 1: 501Mhz
> 2: 850Mhz
> 3: 1034Mhz
> 4: 1144Mhz
> 5: 1228Mhz
> 6: 1275Mhz
> 7: 1295Mhz *
>
> % find /sys/bus/pci/devices/0000:01:00.0/ -type f -name 'current_link*' -print -exec cat {} \;
> /sys/bus/pci/devices/0000:01:00.0/current_link_width
> 8
> /sys/bus/pci/devices/0000:01:00.0/current_link_speed
> 8.0 GT/s PCIe
>
>
> James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ