[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20251122141341.3644-1-ankita@nvidia.com>
Date: Sat, 22 Nov 2025 14:13:41 +0000
From: <ankita@...dia.com>
To: <ankita@...dia.com>, <jgg@...pe.ca>, <yishaih@...dia.com>,
<skolothumtho@...dia.com>, <kevin.tian@...el.com>, <alex@...zbot.org>,
<aniketa@...dia.com>, <vsethi@...dia.com>, <mochs@...dia.com>
CC: <Yunxiang.Li@....com>, <yi.l.liu@...el.com>,
<zhangdongdong@...incomputing.com>, <avihaih@...dia.com>,
<bhelgaas@...gle.com>, <peterx@...hat.com>, <pstanner@...hat.com>,
<apopple@...dia.com>, <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<cjia@...dia.com>, <kwankhede@...dia.com>, <targupta@...dia.com>,
<zhiw@...dia.com>, <danw@...dia.com>, <dnigam@...dia.com>, <kjaju@...dia.com>
Subject: [PATCH v4 RESEND 7/7] vfio/nvgrace-gpu: wait for the GPU mem to be ready
From: Ankit Agrawal <ankita@...dia.com>
Speculative prefetches from CPU to GPU memory until the GPU is
ready after reset can cause harmless corrected RAS events to
be logged on Grace systems. It is thus preferred that the
mapping not be re-established until the GPU is ready post reset.
The GPU readiness can be checked through BAR0 registers similar
to the checking at the time of device probe.
It can take several seconds for the GPU to be ready. So it is
desirable that the time overlaps as much of the VM startup as
possible to reduce impact on the VM bootup time. The GPU
readiness state is thus checked on the first fault/huge_fault
request which amortizes the GPU readiness time. The first fault
is checked using the gpu_mem_mapped flag. The flag is unset on
every GPU reset request by the reset_done handler.
cc: Alex Williamson <alex@...zbot.org>
cc: Jason Gunthorpe <jgg@...pe.ca>
cc: Vikram Sethi <vsethi@...dia.com>
Suggested-by: Alex Williamson <alex@...zbot.org>
Signed-off-by: Ankit Agrawal <ankita@...dia.com>
---
drivers/vfio/pci/nvgrace-gpu/main.c | 51 +++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
index 5a2799dce417..d6a9f1cc4a25 100644
--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -104,6 +104,17 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev)
mutex_init(&nvdev->remap_lock);
}
+ /*
+ * GPU readiness is checked by reading the BAR0 registers.
+ *
+ * ioremap BAR0 to ensure that the BAR0 mapping is present before
+ * register reads on first fault before establishing any GPU
+ * memory mapping.
+ */
+ ret = vfio_pci_core_setup_barmap(vdev, 0);
+ if (ret)
+ return ret;
+
nvdev->gpu_mem_mapped = false;
vfio_pci_core_finish_enable(vdev);
@@ -152,6 +163,27 @@ static int nvgrace_gpu_wait_device_ready(void __iomem *io)
return ret;
}
+static int
+nvgrace_gpu_vfio_pci_premap_check(struct nvgrace_gpu_pci_core_device *nvdev)
+{
+ struct vfio_pci_core_device *vdev = &nvdev->core_device;
+ int ret = 0;
+
+ down_write(&vdev->memory_lock);
+ if (nvdev->gpu_mem_mapped)
+ goto premap_exit;
+
+ ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]);
+ if (ret)
+ goto premap_exit;
+
+ nvdev->gpu_mem_mapped = true;
+
+premap_exit:
+ up_write(&vdev->memory_lock);
+ return ret;
+}
+
static vm_fault_t nvgrace_gpu_vfio_pci_huge_fault(struct vm_fault *vmf,
unsigned int order)
{
@@ -162,6 +194,15 @@ static vm_fault_t nvgrace_gpu_vfio_pci_huge_fault(struct vm_fault *vmf,
struct mem_region *memregion;
unsigned long pgoff, pfn, addr;
+ /*
+ * If the GPU memory is accessed by the CPU while the GPU is
+ * not ready after reset, it can cause harmless corrected RAS
+ * events to be logged. Make sure the GPU is ready before
+ * establishing the mappings.
+ */
+ if (nvgrace_gpu_vfio_pci_premap_check(nvdev))
+ return ret;
+
memregion = nvgrace_gpu_memregion(index, nvdev);
if (!memregion)
return ret;
@@ -485,6 +526,16 @@ nvgrace_gpu_map_device_mem(int index,
struct mem_region *memregion;
int ret = 0;
+ /*
+ * If the GPU memory is accessed by the CPU while the GPU is
+ * not ready after reset, it can cause harmless corrected RAS
+ * events to be logged. Make sure the GPU is ready before
+ * establishing the mappings.
+ */
+ ret = nvgrace_gpu_vfio_pci_premap_check(nvdev);
+ if (ret)
+ return ret;
+
memregion = nvgrace_gpu_memregion(index, nvdev);
if (!memregion)
return -EINVAL;
--
2.34.1
Powered by blists - more mailing lists