linux-kernel - Re: [PATCH v5 1/3] vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250124092453.7d3df3d6.alex.williamson@redhat.com>
Date: Fri, 24 Jan 2025 09:24:53 -0700
From: Alex Williamson <alex.williamson@...hat.com>
To: <ankita@...dia.com>
Cc: <jgg@...dia.com>, <yishaih@...dia.com>,
 <shameerali.kolothum.thodi@...wei.com>, <kevin.tian@...el.com>,
 <zhiw@...dia.com>, <aniketa@...dia.com>, <cjia@...dia.com>,
 <kwankhede@...dia.com>, <targupta@...dia.com>, <vsethi@...dia.com>,
 <acurrid@...dia.com>, <apopple@...dia.com>, <jhubbard@...dia.com>,
 <danw@...dia.com>, <kjaju@...dia.com>, <anuaggarwal@...dia.com>,
 <mochs@...dia.com>, <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 1/3] vfio/nvgrace-gpu: Read dvsec register to
 determine need for uncached resmem

On Thu, 23 Jan 2025 17:48:52 +0000
<ankita@...dia.com> wrote:

> From: Ankit Agrawal <ankita@...dia.com>
> 
> NVIDIA's recently introduced Grace Blackwell (GB) Superchip is a
> continuation with the Grace Hopper (GH) superchip that provides a
> cache coherent access to CPU and GPU to each other's memory with
> an internal proprietary chip-to-chip cache coherent interconnect.
> 
> There is a HW defect on GH systems to support the Multi-Instance
> GPU (MIG) feature [1] that necessiated the presence of a 1G region
> with uncached mapping carved out from the device memory. The 1G
> region is shown as a fake BAR (comprising region 2 and 3) to
> workaround the issue. This is fixed on the GB systems.
> 
> The presence of the fix for the HW defect is communicated by the
> device firmware through the DVSEC PCI config register with ID 3.
> The module reads this to take a different codepath on GB vs GH.
> 
> Scan through the DVSEC registers to identify the correct one and use
> it to determine the presence of the fix. Save the value in the device's
> nvgrace_gpu_pci_core_device structure.
> 
> Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [1]
> 
> CC: Jason Gunthorpe <jgg@...dia.com>
> CC: Kevin Tian <kevin.tian@...el.com>
> Signed-off-by: Ankit Agrawal <ankita@...dia.com>
> ---
>  drivers/vfio/pci/nvgrace-gpu/main.c | 30 +++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> index a467085038f0..dde2daa597f8 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -23,6 +23,11 @@
>  /* A hardwired and constant ABI value between the GPU FW and VFIO driver. */
>  #define MEMBLK_SIZE SZ_512M
>  
> +#define DVSEC_BITMAP_OFFSET 0xA
> +#define MIG_SUPPORTED_WITH_CACHED_RESMEM BIT(0)
> +
> +#define GPU_CAP_DVSEC_REGISTER 3
> +
>  /*
>   * The state of the two device memory region - resmem and usemem - is
>   * saved as struct mem_region.
> @@ -46,6 +51,7 @@ struct nvgrace_gpu_pci_core_device {
>  	struct mem_region resmem;
>  	/* Lock to control device memory kernel mapping */
>  	struct mutex remap_lock;
> +	bool has_mig_hw_bug;
>  };
>  
>  static void nvgrace_gpu_init_fake_bar_emu_regs(struct vfio_device *core_vdev)
> @@ -812,6 +818,26 @@ nvgrace_gpu_init_nvdev_struct(struct pci_dev *pdev,
>  	return ret;
>  }
>  
> +static bool nvgrace_gpu_has_mig_hw_bug(struct pci_dev *pdev)
> +{
> +	int pcie_dvsec;
> +	u16 dvsec_ctrl16;
> +
> +	pcie_dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_NVIDIA,
> +					       GPU_CAP_DVSEC_REGISTER);
> +
> +	if (pcie_dvsec) {
> +		pci_read_config_word(pdev,
> +				     pcie_dvsec + DVSEC_BITMAP_OFFSET,
> +				     &dvsec_ctrl16);
> +
> +		if (dvsec_ctrl16 & MIG_SUPPORTED_WITH_CACHED_RESMEM)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
>  static int nvgrace_gpu_probe(struct pci_dev *pdev,
>  			     const struct pci_device_id *id)
>  {
> @@ -832,6 +858,8 @@ static int nvgrace_gpu_probe(struct pci_dev *pdev,
>  	dev_set_drvdata(&pdev->dev, &nvdev->core_device);
>  
>  	if (ops == &nvgrace_gpu_pci_ops) {
> +		nvdev->has_mig_hw_bug = nvgrace_gpu_has_mig_hw_bug(pdev);
> +
>  		/*
>  		 * Device memory properties are identified in the host ACPI
>  		 * table. Set the nvgrace_gpu_pci_core_device structure.
> @@ -868,6 +896,8 @@ static const struct pci_device_id nvgrace_gpu_vfio_pci_table[] = {
>  	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2345) },
>  	/* GH200 SKU */
>  	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2348) },
> +	/* GB200 SKU */
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_NVIDIA, 0x2941) },
>  	{}
>  };
>  

GB support isn't really complete until patch 3, so shouldn't we hold
off on adding the ID to the table until a trivial patch 4, adding only
the chunk above?  Thanks,

Alex