[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <07252361-8886-4284-bdba-55c3fe728831@redhat.com>
Date: Tue, 26 Nov 2024 12:10:24 -0500
From: Donald Dutile <ddutile@...hat.com>
To: ankita@...dia.com, jgg@...dia.com, maz@...nel.org,
oliver.upton@...ux.dev, joey.gouly@....com, suzuki.poulose@....com,
yuzenghui@...wei.com, catalin.marinas@....com, will@...nel.org,
ryan.roberts@....com, shahuang@...hat.com, lpieralisi@...nel.org
Cc: aniketa@...dia.com, cjia@...dia.com, kwankhede@...dia.com,
targupta@...dia.com, vsethi@...dia.com, acurrid@...dia.com,
apopple@...dia.com, jhubbard@...dia.com, danw@...dia.com, zhiw@...dia.com,
mochs@...dia.com, udhoke@...dia.com, dnigam@...dia.com,
alex.williamson@...hat.com, sebastianene@...gle.com, coltonlewis@...gle.com,
kevin.tian@...el.com, yi.l.liu@...el.com, ardb@...nel.org,
akpm@...ux-foundation.org, gshan@...hat.com, linux-mm@...ck.org,
kvmarm@...ts.linux.dev, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v2 0/1] KVM: arm64: Map GPU memory with no struct pages
My email client says this patch: [PATCH v2 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
is part of a thread for this titled patchPATCH. Is it?
The description has similarities to above description, but some adds, some drops.
So, could you clean these two up into (a) a series, or (b) single, separate PATCH's?
Thanks.
- Don
On 11/18/24 8:19 AM, ankita@...dia.com wrote:
> From: Ankit Agrawal <ankita@...dia.com>
>
> Grace based platforms such as Grace Hopper/Blackwell Superchips have
> CPU accessible cache coherent GPU memory. The current KVM code
> prevents such memory to be mapped Normal cacheable and the patch aims
> to solve this use case.
>
> Today KVM forces the memory to either NORMAL or DEVICE_nGnRE
> based on pfn_is_map_memory() and ignores the per-VMA flags that
> indicates the memory attributes. This means there is no way for
> a VM to get cachable IO memory (like from a CXL or pre-CXL device).
> In both cases the memory will be forced to be DEVICE_nGnRE and the
> VM's memory attributes will be ignored.
>
> The pfn_is_map_memory() is thus restrictive and allows only for
> the memory that is added to the kernel to be marked as cacheable.
> In most cases the code needs to know if there is a struct page, or
> if the memory is in the kernel map and pfn_valid() is an appropriate
> API for this. Extend the umbrella with pfn_valid() to include memory
> with no struct pages for consideration to be mapped cacheable in
> stage 2. A !pfn_valid() implies that the memory is unsafe to be mapped
> as cacheable.
>
> Also take care of the following two cases that are unsafe to be mapped
> as cacheable:
> 1. The VMA pgprot may have VM_IO set alongwith MT_NORMAL or MT_NORMAL_TAGGED.
> Although unexpected and wrong, presence of such configuration cannot
> be ruled out.
> 2. Configurations where VM_MTE_ALLOWED is not set and KVM_CAP_ARM_MTE
> is enabled. Otherwise a malicious guest can enable MTE at stage 1
> without the hypervisor being able to tell. This could cause external
> aborts.
>
> The GPU memory such as on the Grace Hopper systems is interchangeable
> with DDR memory and retains its properties. Executable faults should thus
> be allowed on the memory determined as Normal cacheable.
>
> Note when FWB is not enabled, the kernel expects to trivially do
> cache management by flushing the memory by linearly converting a
> kvm_pte to phys_addr to a KVA, see kvm_flush_dcache_to_poc(). This is
> only possibile for struct page backed memory. Do not allow non-struct
> page memory to be cachable without FWB.
>
> The changes are heavily influenced by the insightful discussions between
> Catalin Marinas and Jason Gunthorpe [1] on v1. Many thanks for their
> valuable suggestions.
>
> Applied over next-20241117 and tested on the Grace Hopper and
> Grace Blackwell platforms by booting up VM and running several CUDA
> workloads. This has not been tested on MTE enabled hardware. If
> someone can give it a try, it will be very helpful.
>
> v1 -> v2
> 1. Removed kvm_is_device_pfn() as a determiner for device type memory
> determination. Instead using pfn_valid()
> 2. Added handling for MTE.
> 3. Minor cleanup.
>
> Link: https://lore.kernel.org/lkml/20230907181459.18145-2-ankita@nvidia.com [1]
>
> Ankit Agrawal (1):
> KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
>
> arch/arm64/include/asm/kvm_pgtable.h | 8 +++
> arch/arm64/kvm/hyp/pgtable.c | 2 +-
> arch/arm64/kvm/mmu.c | 101 +++++++++++++++++++++------
> 3 files changed, 87 insertions(+), 24 deletions(-)
>
Powered by blists - more mailing lists