[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<CH3PR12MB7548DB1A20EA2D32AB923CBCAB15A@CH3PR12MB7548.namprd12.prod.outlook.com>
Date: Mon, 15 Sep 2025 07:47:01 +0000
From: Shameer Kolothum <skolothumtho@...dia.com>
To: Ankit Agrawal <ankita@...dia.com>, Jason Gunthorpe <jgg@...dia.com>,
"alex.williamson@...hat.com" <alex.williamson@...hat.com>, Yishai Hadas
<yishaih@...dia.com>, "kevin.tian@...el.com" <kevin.tian@...el.com>,
"yi.l.liu@...el.com" <yi.l.liu@...el.com>, Zhi Wang <zhiw@...dia.com>
CC: Aniket Agashe <aniketa@...dia.com>, Neo Jia <cjia@...dia.com>, Kirti
Wankhede <kwankhede@...dia.com>, "Tarun Gupta (SW-GPU)"
<targupta@...dia.com>, Vikram Sethi <vsethi@...dia.com>, Andy Currid
<ACurrid@...dia.com>, Alistair Popple <apopple@...dia.com>, John Hubbard
<jhubbard@...dia.com>, Dan Williams <danw@...dia.com>, "Anuj Aggarwal
(SW-GPU)" <anuaggarwal@...dia.com>, Matt Ochs <mochs@...dia.com>, Krishnakant
Jaju <kjaju@...dia.com>, Dheeraj Nigam <dnigam@...dia.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
Subject: RE: [RFC 05/14] vfio/nvgrace-egm: Introduce module to manage EGM
> -----Original Message-----
> From: Ankit Agrawal <ankita@...dia.com>
> Sent: 04 September 2025 05:08
> To: Ankit Agrawal <ankita@...dia.com>; Jason Gunthorpe <jgg@...dia.com>;
> alex.williamson@...hat.com; Yishai Hadas <yishaih@...dia.com>; Shameer
> Kolothum <skolothumtho@...dia.com>; kevin.tian@...el.com;
> yi.l.liu@...el.com; Zhi Wang <zhiw@...dia.com>
> Cc: Aniket Agashe <aniketa@...dia.com>; Neo Jia <cjia@...dia.com>; Kirti
> Wankhede <kwankhede@...dia.com>; Tarun Gupta (SW-GPU)
> <targupta@...dia.com>; Vikram Sethi <vsethi@...dia.com>; Andy Currid
> <acurrid@...dia.com>; Alistair Popple <apopple@...dia.com>; John Hubbard
> <jhubbard@...dia.com>; Dan Williams <danw@...dia.com>; Anuj Aggarwal
> (SW-GPU) <anuaggarwal@...dia.com>; Matt Ochs <mochs@...dia.com>;
> Krishnakant Jaju <kjaju@...dia.com>; Dheeraj Nigam <dnigam@...dia.com>;
> kvm@...r.kernel.org; linux-kernel@...r.kernel.org
> Subject: [RFC 05/14] vfio/nvgrace-egm: Introduce module to manage EGM
>
> From: Ankit Agrawal <ankita@...dia.com>
>
> The Extended GPU Memory (EGM) feature that enables the GPU to access
> the system memory allocations within and across nodes through high
> bandwidth path on Grace Based systems. The GPU can utilize the
> system memory located on the same socket or from a different socket
> or even on a different node in a multi-node system [1].
>
> When the EGM mode is enabled through SBIOS, the host system memory is
> partitioned into 2 parts: One partition for the Host OS usage
> called Hypervisor region, and a second Hypervisor-Invisible (HI) region
> for the VM. Only the hypervisor region is part of the host EFI map
> and is thus visible to the host OS on bootup. Since the entire VM
> sysmem is eligible for EGM allocations within the VM, the HI partition
> is interchangeably called as EGM region in the series. This HI/EGM region
> range base SPA and size is exposed through the ACPI DSDT properties.
>
> Whilst the EGM region is accessible on the host, it is not added to
> the kernel. The HI region is assigned to a VM by mapping the QEMU VMA
> to the SPA using remap_pfn_range().
>
> The following figure shows the memory map in the virtualization
> environment.
>
> |---- Sysmem ----| |--- GPU mem ---| VM Memory
> | | | |
> |IPA <-> SPA map | |IPA <-> SPA map|
> | | | |
> |--- HI / EGM ---|-- Host Mem --| |--- GPU mem ---| Host Memory
>
> Introduce a new nvgrace-egm auxiliary driver module to manage and
> map the HI/EGM region in the Grace Blackwell systems. This binds to
> the auxiliary device created by the parent nvgrace-gpu (in-tree
> module for device assignment) / nvidia-vgpu-vfio (out-of-tree open
> source module for SRIOV vGPU) to manage the EGM region for the VM.
> Note that there is a unique EGM region per socket and the auxiliary
> device gets created for every region. The parent module fetches the
> EGM region information from the ACPI tables and populate to the data
> structures shared with the auxiliary nvgrace-egm module.
>
> nvgrace-egm module handles the following:
> 1. Fetch the EGM memory properties (base HPA, length, proximity domain)
> from the parent device shared EGM region structure.
> 2. Create a char device that can be used as memory-backend-file by Qemu
> for the VM and implement file operations. The char device is /dev/egmX,
> where X is the PXM node ID of the EGM being mapped fetched in 1.
> 3. Zero the EGM memory on first device open().
> 4. Map the QEMU VMA to the EGM region using remap_pfn_range.
> 5. Cleaning up state and destroying the chardev on device unbind.
> 6. Handle presence of retired ECC pages on the EGM region.
>
> Suggested-by: Jason Gunthorpe <jgg@...dia.com>
> Signed-off-by: Ankit Agrawal <ankita@...dia.com>
> ---
> MAINTAINERS | 6 ++++++
> drivers/vfio/pci/nvgrace-gpu/Kconfig | 11 +++++++++++
> drivers/vfio/pci/nvgrace-gpu/Makefile | 3 +++
> drivers/vfio/pci/nvgrace-gpu/egm.c | 22 ++++++++++++++++++++++
> drivers/vfio/pci/nvgrace-gpu/main.c | 1 +
> 5 files changed, 43 insertions(+)
> create mode 100644 drivers/vfio/pci/nvgrace-gpu/egm.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index dd7df834b70b..ec6bc10f346d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -26476,6 +26476,12 @@ F: drivers/vfio/pci/nvgrace-
> gpu/egm_dev.h
> F: drivers/vfio/pci/nvgrace-gpu/main.c
> F: include/linux/nvgrace-egm.h
>
> +VFIO NVIDIA GRACE EGM DRIVER
> +M: Ankit Agrawal <ankita@...dia.com>
> +L: kvm@...r.kernel.org
> +S: Supported
> +F: drivers/vfio/pci/nvgrace-gpu/egm.c
> +
> VFIO PCI DEVICE SPECIFIC DRIVERS
> R: Jason Gunthorpe <jgg@...dia.com>
> R: Yishai Hadas <yishaih@...dia.com>
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Kconfig b/drivers/vfio/pci/nvgrace-
> gpu/Kconfig
> index a7f624b37e41..d5773bbd22f5 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/Kconfig
> +++ b/drivers/vfio/pci/nvgrace-gpu/Kconfig
> @@ -1,8 +1,19 @@
> # SPDX-License-Identifier: GPL-2.0-only
> +config NVGRACE_EGM
> + tristate "EGM driver for NVIDIA Grace Hopper and Blackwell
> Superchip"
> + depends on ARM64 || (COMPILE_TEST && 64BIT)
Should it depend on NVGRACE_GPU_VFIO_PCI as well?
Thanks,
Shameer
> + help
> + Extended GPU Memory (EGM) support for the GPU in the NVIDIA
> Grace
> + based chips required to avail the CPU memory as additional
> + cross-node/cross-socket memory for GPU using KVM/qemu.
> +
> + If you don't know what to do here, say N.
> +
> config NVGRACE_GPU_VFIO_PCI
> tristate "VFIO support for the GPU in the NVIDIA Grace Hopper
> Superchip"
> depends on ARM64 || (COMPILE_TEST && 64BIT)
> select VFIO_PCI_CORE
> + select NVGRACE_EGM
> help
> VFIO support for the GPU in the NVIDIA Grace Hopper Superchip is
> required to assign the GPU device to userspace using
> KVM/qemu/etc.
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Makefile b/drivers/vfio/pci/nvgrace-
> gpu/Makefile
> index e72cc6739ef8..d0d191be56b9 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/Makefile
> +++ b/drivers/vfio/pci/nvgrace-gpu/Makefile
> @@ -1,3 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-only
> obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu-vfio-pci.o
> nvgrace-gpu-vfio-pci-y := main.o egm_dev.o
> +
> +obj-$(CONFIG_NVGRACE_EGM) += nvgrace-egm.o
> +nvgrace-egm-y := egm.o
> diff --git a/drivers/vfio/pci/nvgrace-gpu/egm.c b/drivers/vfio/pci/nvgrace-
> gpu/egm.c
> new file mode 100644
> index 000000000000..999808807019
> --- /dev/null
> +++ b/drivers/vfio/pci/nvgrace-gpu/egm.c
> @@ -0,0 +1,22 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights
> reserved
> + */
> +
> +#include <linux/vfio_pci_core.h>
> +
> +static int __init nvgrace_egm_init(void)
> +{
> + return 0;
> +}
> +
> +static void __exit nvgrace_egm_cleanup(void)
> +{
> +}
> +
> +module_init(nvgrace_egm_init);
> +module_exit(nvgrace_egm_cleanup);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Ankit Agrawal <ankita@...dia.com>");
> +MODULE_DESCRIPTION("NVGRACE EGM - Module to support Extended GPU
> Memory on NVIDIA Grace Based systems");
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-
> gpu/main.c
> index 7486a1b49275..b1ccd1ac2e0a 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -1125,3 +1125,4 @@ MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Ankit Agrawal <ankita@...dia.com>");
> MODULE_AUTHOR("Aniket Agashe <aniketa@...dia.com>");
> MODULE_DESCRIPTION("VFIO NVGRACE GPU PF - User Level driver for
> NVIDIA devices with CPU coherently accessible device memory");
> +MODULE_SOFTDEP("pre: nvgrace-egm");
> --
> 2.34.1
Powered by blists - more mailing lists