[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250124092453.7d3df3d6.alex.williamson@redhat.com>
Date: Fri, 24 Jan 2025 09:24:53 -0700
From: Alex Williamson <alex.williamson@...hat.com>
To: <ankita@...dia.com>
Cc: <jgg@...dia.com>, <yishaih@...dia.com>,
<shameerali.kolothum.thodi@...wei.com>, <kevin.tian@...el.com>,
<zhiw@...dia.com>, <aniketa@...dia.com>, <cjia@...dia.com>,
<kwankhede@...dia.com>, <targupta@...dia.com>, <vsethi@...dia.com>,
<acurrid@...dia.com>, <apopple@...dia.com>, <jhubbard@...dia.com>,
<danw@...dia.com>, <kjaju@...dia.com>, <anuaggarwal@...dia.com>,
<mochs@...dia.com>, <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 1/3] vfio/nvgrace-gpu: Read dvsec register to
determine need for uncached resmem
On Thu, 23 Jan 2025 17:48:52 +0000
<ankita@...dia.com> wrote:
> From: Ankit Agrawal <ankita@...dia.com>
>
> NVIDIA's recently introduced Grace Blackwell (GB) Superchip is a
> continuation with the Grace Hopper (GH) superchip that provides a
> cache coherent access to CPU and GPU to each other's memory with
> an internal proprietary chip-to-chip cache coherent interconnect.
>
> There is a HW defect on GH systems to support the Multi-Instance
> GPU (MIG) feature [1] that necessiated the presence of a 1G region
> with uncached mapping carved out from the device memory. The 1G
> region is shown as a fake BAR (comprising region 2 and 3) to
> workaround the issue. This is fixed on the GB systems.
>
> The presence of the fix for the HW defect is communicated by the
> device firmware through the DVSEC PCI config register with ID 3.
> The module reads this to take a different codepath on GB vs GH.
>
> Scan through the DVSEC registers to identify the correct one and use
> it to determine the presence of the fix. Save the value in the device's
> nvgrace_gpu_pci_core_device structure.
>
> Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [1]
>
> CC: Jason Gunthorpe <jgg@...dia.com>
> CC: Kevin Tian <kevin.tian@...el.com>
> Signed-off-by: Ankit Agrawal <ankita@...dia.com>
> ---
> drivers/vfio/pci/nvgrace-gpu/main.c | 30 +++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> index a467085038f0..dde2daa597f8 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -23,6 +23,11 @@
> /* A hardwired and constant ABI value between the GPU FW and VFIO driver. */
> #define MEMBLK_SIZE SZ_512M
>
> +#define DVSEC_BITMAP_OFFSET 0xA
> +#define MIG_SUPPORTED_WITH_CACHED_RESMEM BIT(0)
> +
> +#define GPU_CAP_DVSEC_REGISTER 3
> +
> /*
> * The state of the two device memory region - resmem and usemem - is
> * saved as struct mem_region.
> @@ -46,6 +51,7 @@ struct nvgrace_gpu_pci_core_device {
> struct mem_region resmem;
> /* Lock to control device memory kernel mapping */
> struct mutex remap_lock;
> + bool has_mig_hw_bug;
> };
>
> static void nvgrace_gpu_init_fake_bar_emu_regs(struct vfio_device *core_vdev)
> @@ -812,6 +818,26 @@ nvgrace_gpu_init_nvdev_struct(struct pci_dev *pdev,
> return ret;
> }
>
> +static bool nvgrace_gpu_has_mig_hw_bug(struct pci_dev *pdev)
> +{
> + int pcie_dvsec;
> + u16 dvsec_ctrl16;
> +
> + pcie_dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_NVIDIA,
> + GPU_CAP_DVSEC_REGISTER);
> +
> + if (pcie_dvsec) {
> + pci_read_config_word(pdev,
> + pcie_dvsec + DVSEC_BITMAP_OFFSET,
> + &dvsec_ctrl16);
> +
> + if (dvsec_ctrl16 & MIG_SUPPORTED_WITH_CACHED_RESMEM)
> + return false;
> + }
> +
> + return true;
> +}
> +
> static int nvgrace_gpu_probe(struct pci_dev *pdev,
> const struct pci_device_id *id)
> {
> @@ -832,6 +858,8 @@ static int nvgrace_gpu_probe(struct pci_dev *pdev,
> dev_set_drvdata(&pdev->dev, &nvdev->core_device);
>
> if (ops == &nvgrace_gpu_pci_ops) {
> + nvdev->has_mig_hw_bug = nvgrace_gpu_has_mig_hw_bug(pdev);
> +
> /*
> * Device memory properties are identified in the host ACPI
> * table. Set the nvgrace_gpu_pci_core_device structure.
> @@ -868,6 +896,8 @@ static const struct pci_device_id nvgrace_gpu_vfio_pci_table[] = {
> { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2345) },
> /* GH200 SKU */
> { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2348) },
> + /* GB200 SKU */
> + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2941) },
> {}
> };
>
GB support isn't really complete until patch 3, so shouldn't we hold
off on adding the ID to the table until a trivial patch 4, adding only
the chunk above? Thanks,
Alex
Powered by blists - more mailing lists