[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190425130124.GD11428@google.com>
Date: Thu, 25 Apr 2019 08:01:24 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Lyude Paul <lyude@...hat.com>
Cc: linux-pci@...r.kernel.org, nouveau@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org, Karol Herbst <kherbst@...hat.com>,
Ben Skeggs <skeggsb@...il.com>, stable@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] pci/quirks: Add quirk to reset nvgpu at boot for the
Lenovo ThinkPad P50
On Tue, Feb 12, 2019 at 05:02:30PM -0500, Lyude Paul wrote:
> On a very specific subset of ThinkPad P50 SKUs, particularly ones that
> come with a Quadro M1000M chip instead of the M2000M variant, the BIOS
> seems to have a very nasty habit of not always resetting the secondary
> Nvidia GPU between full reboots if the laptop is configured in Hybrid
> Graphics mode. The reason for this happening is unknown, but the
> following steps and possibly a good bit of patience will reproduce the
> issue:
>
> 1. Boot up the laptop normally in Hybrid graphics mode
> 2. Make sure nouveau is loaded and that the GPU is awake
> 2. Allow the nvidia GPU to runtime suspend itself after being idle
> 3. Reboot the machine, the more sudden the better (e.g sysrq-b may help)
> 4. If nouveau loads up properly, reboot the machine again and go back to
> step 2 until you reproduce the issue
>
> This results in some very strange behavior: the GPU will
> quite literally be left in exactly the same state it was in when the
> previously booted kernel started the reboot. This has all sorts of bad
> sideaffects: for starters, this completely breaks nouveau starting with a
> mysterious EVO channel failure that happens well before we've actually
> used the EVO channel for anything:
>
> nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000
> 00000002
> ...
> So to do this, we add a new pci quirk using
> DECLARE_PCI_FIXUP_CLASS_FINAL that will be invoked before the PCI probe
> at boot finishes. From there, we check to make sure that this is indeed
> the specific P50 variant of this GPU. We also make sure that the GPU PCI
> device is advertising NoReset- in order to prevent us from trying to
> reset the GPU when the machine is in Dedicated graphics mode (where the
> GPU being initialized by the BIOS is normal and expected). Finally, we
> try mapping the MMIO space for the GPU which should only work if the GPU
> is actually active in D0 mode. We can then read the magic 0x2240c
> register on the GPU, which will have bit 1 set if the GPU's firmware has
> already been posted during a previous boot. Once we've confirmed all of
> this, we reset the PCI device and re-disable it - bringing the GPU back
> into a healthy state.
>
> Signed-off-by: Lyude Paul <lyude@...hat.com>
> Cc: nouveau@...ts.freedesktop.org
> Cc: dri-devel@...ts.freedesktop.org
> Cc: Karol Herbst <kherbst@...hat.com>
> Cc: Ben Skeggs <skeggsb@...il.com>
> Cc: stable@...r.kernel.org
Applied to pci/misc for v5.2, thanks!
> ---
> drivers/pci/quirks.c | 65 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 65 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b0a413f3f7ca..948492fda8bf 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5117,3 +5117,68 @@ SWITCHTEC_QUIRK(0x8573); /* PFXI 48XG3 */
> SWITCHTEC_QUIRK(0x8574); /* PFXI 64XG3 */
> SWITCHTEC_QUIRK(0x8575); /* PFXI 80XG3 */
> SWITCHTEC_QUIRK(0x8576); /* PFXI 96XG3 */
> +
> +/*
> + * On certain Lenovo Thinkpad P50 SKUs, specifically those with a Nvidia
> + * Quadro M1000M, the BIOS will occasionally make the mistake of not resetting
> + * the nvidia GPU between reboots if the system is configured to use hybrid
> + * graphics mode. This results in the GPU being left in whatever state it was
> + * in during the previous boot which causes spurious interrupts from the GPU,
> + * which in turn cause us to disable the wrong IRQs and end up breaking the
> + * touchpad. Unsurprisingly, this also completely breaks nouveau.
> + *
> + * Luckily, it seems a simple reset of the PCI device for the nvidia GPU
> + * manages to bring the GPU back into a clean state and fix all of these
> + * issues. Additionally since the GPU will report NoReset+ when the machine is
> + * configured in Dedicated display mode, we don't need to worry about
> + * accidentally resetting the GPU when it's supposed to already be
> + * initialized.
> + */
> +static void
> +quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot(struct pci_dev *pdev)
> +{
> + void __iomem *map;
> + int ret;
> +
> + if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
> + pdev->subsystem_device != 0x222e ||
> + !pdev->reset_fn)
> + return;
> +
> + /*
> + * If we can't enable the device's mmio space, it's probably not even
> + * initialized. This is fine, and means we can just skip the quirk
> + * entirely.
> + */
> + if (pci_enable_device_mem(pdev)) {
> + pci_dbg(pdev, "Can't enable device mem, no reset needed\n");
> + return;
> + }
> +
> + /* Taken from drivers/gpu/drm/nouveau/engine/device/base.c */
> + map = ioremap(pci_resource_start(pdev, 0), 0x102000);
> + if (!map) {
> + pci_err(pdev, "Can't map MMIO space, this is probably very bad\n");
> + goto out_disable;
> + }
> +
> + /*
> + * Be extra careful, and make sure that the GPU firmware is posted
> + * before trying a reset
> + */
> + if (ioread32(map + 0x2240c) & 0x2) {
> + pci_info(pdev,
> + FW_BUG "GPU left initialized by EFI, resetting\n");
> + ret = pci_reset_function(pdev);
> + if (ret < 0)
> + pci_err(pdev, "Failed to reset GPU: %d\n", ret);
> + }
> +
> + iounmap(map);
> +out_disable:
> + pci_disable_device(pdev);
> +}
> +
> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
> + PCI_CLASS_DISPLAY_VGA, 8,
> + quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot);
> --
> 2.20.1
>
Powered by blists - more mailing lists