[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250618065541.50049-1-ankita@nvidia.com>
Date: Wed, 18 Jun 2025 06:55:36 +0000
From: <ankita@...dia.com>
To: <ankita@...dia.com>, <jgg@...dia.com>, <maz@...nel.org>,
<oliver.upton@...ux.dev>, <joey.gouly@....com>, <suzuki.poulose@....com>,
<yuzenghui@...wei.com>, <catalin.marinas@....com>, <will@...nel.org>,
<ryan.roberts@....com>, <shahuang@...hat.com>, <lpieralisi@...nel.org>,
<david@...hat.com>, <ddutile@...hat.com>, <seanjc@...gle.com>
CC: <aniketa@...dia.com>, <cjia@...dia.com>, <kwankhede@...dia.com>,
<kjaju@...dia.com>, <targupta@...dia.com>, <vsethi@...dia.com>,
<acurrid@...dia.com>, <apopple@...dia.com>, <jhubbard@...dia.com>,
<danw@...dia.com>, <zhiw@...dia.com>, <mochs@...dia.com>,
<udhoke@...dia.com>, <dnigam@...dia.com>, <alex.williamson@...hat.com>,
<sebastianene@...gle.com>, <coltonlewis@...gle.com>, <kevin.tian@...el.com>,
<yi.l.liu@...el.com>, <ardb@...nel.org>, <akpm@...ux-foundation.org>,
<gshan@...hat.com>, <linux-mm@...ck.org>, <tabba@...gle.com>,
<qperret@...gle.com>, <kvmarm@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
<maobibo@...ngson.cn>
Subject: [PATCH v7 0/5] KVM: arm64: Map GPU device memory as cacheable
From: Ankit Agrawal <ankita@...dia.com>
Grace based platforms such as Grace Hopper/Blackwell Superchips have
CPU accessible cache coherent GPU memory. The GPU device memory is
essentially a DDR memory and retains properties such as cacheability,
unaligned accesses, atomics and handling of executable faults. This
requires the device memory to be mapped as NORMAL in stage-2.
Today KVM forces the memory to either NORMAL or DEVICE_nGnRE depending
on whether the memory region is added to the kernel. The KVM code is
thus restrictive and prevents device memory that is not added to the
kernel to be marked as cacheable. The patch aims to solve this.
A cachebility check is made by consulting the VMA pgprot value. If
the pgprot mapping type is cacheable, it is considered safe to be
mapped cacheable as the KVM S2 will have the same Normal memory type
as the VMA has in the S1 and KVM has no additional responsibility
for safety.
Note when FWB (Force Write Back) is not enabled, the kernel expects to
trivially do cache management by flushing the memory by linearly
converting a kvm_pte to phys_addr to a KVA. The cache management thus
relies on memory being mapped. Since the GPU device memory is not kernel
mapped, exit when the FWB is not supported. Similarly, ARM64_HAS_CACHE_DIC
allows KVM to avoid flushing the icache and turns icache_inval_pou() into
a NOP. So the cacheable PFNMAP is made contingent on these two hardware
features.
The ability to safely do the cacheable mapping of PFNMAP is exposed
through a KVM capability for userspace consumption.
The changes are heavily influenced by the discussions among
maintainers Marc Zyngier and Oliver Upton besides Jason Gunthorpe,
Catalin Marinas, David Hildenbrand, Sean Christopherson [1]. Many
thanks for their valuable suggestions.
Applied over next-20250610 and tested on the Grace Blackwell
platform by booting up VM, loading NVIDIA module [2] and running
nvidia-smi in the VM.
To run CUDA workloads, there is a dependency on the IOMMUFD and the
Nested Page Table patches being worked on separately by Nicolin Chen.
(nicolinc@...dia.com). NVIDIA has provided git repositories which
includes all the requisite kernel [3] and Qemu [4] patches in case
one wants to try.
v6 -> v7
1. New patch to rename symbols to more accurately reflect the
CMO usage functionality (Jason Gunthorpe).
2. Updated the block cacheable PFNMAP patch invert the cacheability
check function (Sean Christopherson).
3. Removed the memslot flag KVM_MEM_ENABLE_CACHEABLE_PFNMAP.
(Jason Gunthorpe, Sean Christopherson, Oliver Upton).
4. Commit message changes in 2/5. (Jason Gunthorpe)
v5 -> v6
1. 2/5 updated to add kvm_arch_supports_cacheable_pfnmap weak
definition to avoid build warnings. (Donald Dutile).
v4 -> v5
1. Invert the check to allow MT_DEVICE_* or NORMAL_NC instead of
disallowing MT_NORMAL in 1/5. (Catalin Marinas)
2. Removed usage of stage2_has_fwb and directly using the FWB
cap check. (Oliver Upton)
3. Introduced kvm_arch_supports_cacheable_pfnmap to check if
the prereq features are present. (David Hildenbrand)
v3 -> v4
1. Fixed a security bug due to mismatched attributes between S1 and
S2 mapping to move it to a separate patch. Suggestion by
Jason Gunthorpe (jgg@...dia.com).
2. New minor patch to change the scope of the FWB support indicator
function.
3. Patch to introduce a new memslot flag. Suggestion by Oliver Upton
(oliver.upton@...ux.dev) and Marc Zyngier (maz@...nel.org)
4. Patch to introduce a new KVM cap to expose cacheable PFNMAP support.
Suggestion by Marc Zyngier (maz@...nel.org).
5. Added checks for ARM64_HAS_CACHE_DIC. Suggestion by Catalin Marinas
(catalin.marinas@....com)
v2 -> v3
1. Restricted the new changes to check for cacheability to VM_PFNMAP
based on David Hildenbrand's (david@...hat.com) suggestion.
2. Removed the MTE checks based on Jason Gunthorpe's (jgg@...dia.com)
observation that it already done earlier in
kvm_arch_prepare_memory_region.
3. Dropped the pfn_valid() checks based on suggestions by
Catalin Marinas (catalin.marinas@....com).
4. Removed the code for exec fault handling as it is not needed
anymore.
v1 -> v2
1. Removed kvm_is_device_pfn() as a determiner for device type memory
determination. Instead using pfn_valid()
2. Added handling for MTE.
3. Minor cleanup.
Link: https://lore.kernel.org/all/20250310103008.3471-1-ankita@nvidia.com [1]
Link: https://github.com/NVIDIA/open-gpu-kernel-modules [2]
Link: https://github.com/NVIDIA/NV-Kernels/tree/6.8_ghvirt [3]
Link: https://github.com/NVIDIA/QEMU/tree/6.8_ghvirt_iommufd_vcmdq [4]
v6 Link:
Link: https://lore.kernel.org/all/20250524013943.2832-1-ankita@nvidia.com/
Ankit Agrawal (5):
KVM: arm64: Rename symbols to reflect whether CMO may be used
KVM: arm64: Block cacheable PFNMAP mapping
KVM: arm64: New function to determine hardware cache management
support
KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
KVM: arm64: Expose new KVM cap for cacheable PFNMAP
Documentation/virt/kvm/api.rst | 13 ++++-
arch/arm64/kvm/arm.c | 7 +++
arch/arm64/kvm/mmu.c | 98 ++++++++++++++++++++++++++++++----
include/linux/kvm_host.h | 2 +
include/uapi/linux/kvm.h | 1 +
virt/kvm/kvm_main.c | 5 ++
6 files changed, 115 insertions(+), 11 deletions(-)
--
2.34.1
Powered by blists - more mailing lists