lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <CH3PR12MB7548DB1A20EA2D32AB923CBCAB15A@CH3PR12MB7548.namprd12.prod.outlook.com>
Date: Mon, 15 Sep 2025 07:47:01 +0000
From: Shameer Kolothum <skolothumtho@...dia.com>
To: Ankit Agrawal <ankita@...dia.com>, Jason Gunthorpe <jgg@...dia.com>,
	"alex.williamson@...hat.com" <alex.williamson@...hat.com>, Yishai Hadas
	<yishaih@...dia.com>, "kevin.tian@...el.com" <kevin.tian@...el.com>,
	"yi.l.liu@...el.com" <yi.l.liu@...el.com>, Zhi Wang <zhiw@...dia.com>
CC: Aniket Agashe <aniketa@...dia.com>, Neo Jia <cjia@...dia.com>, Kirti
 Wankhede <kwankhede@...dia.com>, "Tarun Gupta (SW-GPU)"
	<targupta@...dia.com>, Vikram Sethi <vsethi@...dia.com>, Andy Currid
	<ACurrid@...dia.com>, Alistair Popple <apopple@...dia.com>, John Hubbard
	<jhubbard@...dia.com>, Dan Williams <danw@...dia.com>, "Anuj Aggarwal
 (SW-GPU)" <anuaggarwal@...dia.com>, Matt Ochs <mochs@...dia.com>, Krishnakant
 Jaju <kjaju@...dia.com>, Dheeraj Nigam <dnigam@...dia.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: RE: [RFC 05/14] vfio/nvgrace-egm: Introduce module to manage EGM



> -----Original Message-----
> From: Ankit Agrawal <ankita@...dia.com>
> Sent: 04 September 2025 05:08
> To: Ankit Agrawal <ankita@...dia.com>; Jason Gunthorpe <jgg@...dia.com>;
> alex.williamson@...hat.com; Yishai Hadas <yishaih@...dia.com>; Shameer
> Kolothum <skolothumtho@...dia.com>; kevin.tian@...el.com;
> yi.l.liu@...el.com; Zhi Wang <zhiw@...dia.com>
> Cc: Aniket Agashe <aniketa@...dia.com>; Neo Jia <cjia@...dia.com>; Kirti
> Wankhede <kwankhede@...dia.com>; Tarun Gupta (SW-GPU)
> <targupta@...dia.com>; Vikram Sethi <vsethi@...dia.com>; Andy Currid
> <acurrid@...dia.com>; Alistair Popple <apopple@...dia.com>; John Hubbard
> <jhubbard@...dia.com>; Dan Williams <danw@...dia.com>; Anuj Aggarwal
> (SW-GPU) <anuaggarwal@...dia.com>; Matt Ochs <mochs@...dia.com>;
> Krishnakant Jaju <kjaju@...dia.com>; Dheeraj Nigam <dnigam@...dia.com>;
> kvm@...r.kernel.org; linux-kernel@...r.kernel.org
> Subject: [RFC 05/14] vfio/nvgrace-egm: Introduce module to manage EGM
> 
> From: Ankit Agrawal <ankita@...dia.com>
> 
> The Extended GPU Memory (EGM) feature that enables the GPU to access
> the system memory allocations within and across nodes through high
> bandwidth path on Grace Based systems. The GPU can utilize the
> system memory located on the same socket or from a different socket
> or even on a different node in a multi-node system [1].
> 
> When the EGM mode is enabled through SBIOS, the host system memory is
> partitioned into 2 parts: One partition for the Host OS usage
> called Hypervisor region, and a second Hypervisor-Invisible (HI) region
> for the VM. Only the hypervisor region is part of the host EFI map
> and is thus visible to the host OS on bootup. Since the entire VM
> sysmem is eligible for EGM allocations within the VM, the HI partition
> is interchangeably called as EGM region in the series. This HI/EGM region
> range base SPA and size is exposed through the ACPI DSDT properties.
> 
> Whilst the EGM region is accessible on the host, it is not added to
> the kernel. The HI region is assigned to a VM by mapping the QEMU VMA
> to the SPA using remap_pfn_range().
> 
> The following figure shows the memory map in the virtualization
> environment.
> 
> |---- Sysmem ----|                  |--- GPU mem ---|  VM Memory
> |                |                  |               |
> |IPA <-> SPA map |                  |IPA <-> SPA map|
> |                |                  |               |
> |--- HI / EGM ---|-- Host Mem --|   |--- GPU mem ---|  Host Memory
> 
> Introduce a new nvgrace-egm auxiliary driver module to manage and
> map the HI/EGM region in the Grace Blackwell systems. This binds to
> the auxiliary device created by the parent nvgrace-gpu (in-tree
> module for device assignment) / nvidia-vgpu-vfio (out-of-tree open
> source module for SRIOV vGPU) to manage the EGM region for the VM.
> Note that there is a unique EGM region per socket and the auxiliary
> device gets created for every region. The parent module fetches the
> EGM region information from the ACPI tables and populate to the data
> structures shared with the auxiliary nvgrace-egm module.
> 
> nvgrace-egm module handles the following:
> 1. Fetch the EGM memory properties (base HPA, length, proximity domain)
> from the parent device shared EGM region structure.
> 2. Create a char device that can be used as memory-backend-file by Qemu
> for the VM and implement file operations. The char device is /dev/egmX,
> where X is the PXM node ID of the EGM being mapped fetched in 1.
> 3. Zero the EGM memory on first device open().
> 4. Map the QEMU VMA to the EGM region using remap_pfn_range.
> 5. Cleaning up state and destroying the chardev on device unbind.
> 6. Handle presence of retired ECC pages on the EGM region.
> 
> Suggested-by: Jason Gunthorpe <jgg@...dia.com>
> Signed-off-by: Ankit Agrawal <ankita@...dia.com>
> ---
>  MAINTAINERS                           |  6 ++++++
>  drivers/vfio/pci/nvgrace-gpu/Kconfig  | 11 +++++++++++
>  drivers/vfio/pci/nvgrace-gpu/Makefile |  3 +++
>  drivers/vfio/pci/nvgrace-gpu/egm.c    | 22 ++++++++++++++++++++++
>  drivers/vfio/pci/nvgrace-gpu/main.c   |  1 +
>  5 files changed, 43 insertions(+)
>  create mode 100644 drivers/vfio/pci/nvgrace-gpu/egm.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index dd7df834b70b..ec6bc10f346d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -26476,6 +26476,12 @@ F:	drivers/vfio/pci/nvgrace-
> gpu/egm_dev.h
>  F:	drivers/vfio/pci/nvgrace-gpu/main.c
>  F:	include/linux/nvgrace-egm.h
> 
> +VFIO NVIDIA GRACE EGM DRIVER
> +M:	Ankit Agrawal <ankita@...dia.com>
> +L:	kvm@...r.kernel.org
> +S:	Supported
> +F:	drivers/vfio/pci/nvgrace-gpu/egm.c
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@...dia.com>
>  R:	Yishai Hadas <yishaih@...dia.com>
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Kconfig b/drivers/vfio/pci/nvgrace-
> gpu/Kconfig
> index a7f624b37e41..d5773bbd22f5 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/Kconfig
> +++ b/drivers/vfio/pci/nvgrace-gpu/Kconfig
> @@ -1,8 +1,19 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> +config NVGRACE_EGM
> +	tristate "EGM driver for NVIDIA Grace Hopper and Blackwell
> Superchip"
> +	depends on ARM64 || (COMPILE_TEST && 64BIT)

Should it depend on NVGRACE_GPU_VFIO_PCI as well?

Thanks,
Shameer

> +	help
> +	  Extended GPU Memory (EGM) support for the GPU in the NVIDIA
> Grace
> +	  based chips required to avail the CPU memory as additional
> +	  cross-node/cross-socket memory for GPU using KVM/qemu.
> +
> +	  If you don't know what to do here, say N.
> +
>  config NVGRACE_GPU_VFIO_PCI
>  	tristate "VFIO support for the GPU in the NVIDIA Grace Hopper
> Superchip"
>  	depends on ARM64 || (COMPILE_TEST && 64BIT)
>  	select VFIO_PCI_CORE
> +	select NVGRACE_EGM
>  	help
>  	  VFIO support for the GPU in the NVIDIA Grace Hopper Superchip is
>  	  required to assign the GPU device to userspace using
> KVM/qemu/etc.
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Makefile b/drivers/vfio/pci/nvgrace-
> gpu/Makefile
> index e72cc6739ef8..d0d191be56b9 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/Makefile
> +++ b/drivers/vfio/pci/nvgrace-gpu/Makefile
> @@ -1,3 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu-vfio-pci.o
>  nvgrace-gpu-vfio-pci-y := main.o egm_dev.o
> +
> +obj-$(CONFIG_NVGRACE_EGM) += nvgrace-egm.o
> +nvgrace-egm-y := egm.o
> diff --git a/drivers/vfio/pci/nvgrace-gpu/egm.c b/drivers/vfio/pci/nvgrace-
> gpu/egm.c
> new file mode 100644
> index 000000000000..999808807019
> --- /dev/null
> +++ b/drivers/vfio/pci/nvgrace-gpu/egm.c
> @@ -0,0 +1,22 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights
> reserved
> + */
> +
> +#include <linux/vfio_pci_core.h>
> +
> +static int __init nvgrace_egm_init(void)
> +{
> +	return 0;
> +}
> +
> +static void __exit nvgrace_egm_cleanup(void)
> +{
> +}
> +
> +module_init(nvgrace_egm_init);
> +module_exit(nvgrace_egm_cleanup);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Ankit Agrawal <ankita@...dia.com>");
> +MODULE_DESCRIPTION("NVGRACE EGM - Module to support Extended GPU
> Memory on NVIDIA Grace Based systems");
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-
> gpu/main.c
> index 7486a1b49275..b1ccd1ac2e0a 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -1125,3 +1125,4 @@ MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Ankit Agrawal <ankita@...dia.com>");
>  MODULE_AUTHOR("Aniket Agashe <aniketa@...dia.com>");
>  MODULE_DESCRIPTION("VFIO NVGRACE GPU PF - User Level driver for
> NVIDIA devices with CPU coherently accessible device memory");
> +MODULE_SOFTDEP("pre: nvgrace-egm");
> --
> 2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ