linux-kernel - Re: [PATCH v9 0/6] KVM: arm64: Map GPU device memory as cacheable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <SA1PR12MB7199C4F31576787D0400902EB045A@SA1PR12MB7199.namprd12.prod.outlook.com>
Date: Fri, 27 Jun 2025 05:03:13 +0000
From: Ankit Agrawal <ankita@...dia.com>
To: Jason Gunthorpe <jgg@...dia.com>, "maz@...nel.org" <maz@...nel.org>,
	"oliver.upton@...ux.dev" <oliver.upton@...ux.dev>, "joey.gouly@....com"
	<joey.gouly@....com>, "suzuki.poulose@....com" <suzuki.poulose@....com>,
	"yuzenghui@...wei.com" <yuzenghui@...wei.com>, "catalin.marinas@....com"
	<catalin.marinas@....com>, "will@...nel.org" <will@...nel.org>,
	"ryan.roberts@....com" <ryan.roberts@....com>, "shahuang@...hat.com"
	<shahuang@...hat.com>, "lpieralisi@...nel.org" <lpieralisi@...nel.org>,
	"david@...hat.com" <david@...hat.com>, "ddutile@...hat.com"
	<ddutile@...hat.com>, "seanjc@...gle.com" <seanjc@...gle.com>
CC: Aniket Agashe <aniketa@...dia.com>, Neo Jia <cjia@...dia.com>, Kirti
 Wankhede <kwankhede@...dia.com>, Krishnakant Jaju <kjaju@...dia.com>, "Tarun
 Gupta (SW-GPU)" <targupta@...dia.com>, Vikram Sethi <vsethi@...dia.com>, Andy
 Currid <acurrid@...dia.com>, Alistair Popple <apopple@...dia.com>, John
 Hubbard <jhubbard@...dia.com>, Dan Williams <danw@...dia.com>, Zhi Wang
	<zhiw@...dia.com>, Matt Ochs <mochs@...dia.com>, Uday Dhoke
	<udhoke@...dia.com>, Dheeraj Nigam <dnigam@...dia.com>,
	"alex.williamson@...hat.com" <alex.williamson@...hat.com>,
	"sebastianene@...gle.com" <sebastianene@...gle.com>, "coltonlewis@...gle.com"
	<coltonlewis@...gle.com>, "kevin.tian@...el.com" <kevin.tian@...el.com>,
	"yi.l.liu@...el.com" <yi.l.liu@...el.com>, "ardb@...nel.org"
	<ardb@...nel.org>, "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"gshan@...hat.com" <gshan@...hat.com>, "linux-mm@...ck.org"
	<linux-mm@...ck.org>, "tabba@...gle.com" <tabba@...gle.com>,
	"qperret@...gle.com" <qperret@...gle.com>, "kvmarm@...ts.linux.dev"
	<kvmarm@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "maobibo@...ngson.cn"
	<maobibo@...ngson.cn>
Subject: Re: [PATCH v9 0/6] KVM: arm64: Map GPU device memory as cacheable

> Grace based platforms such as Grace Hopper/Blackwell Superchips have
> CPU accessible cache coherent GPU memory. The GPU device memory is
> essentially a DDR memory and retains properties such as cacheability,
> unaligned accesses, atomics and handling of executable faults. This
> requires the device memory to be mapped as NORMAL in stage-2.
>
> Today KVM forces the memory to either NORMAL or DEVICE_nGnRE depending
> on whether the memory region is added to the kernel. The KVM code is
> thus restrictive and prevents device memory that is not added to the
> kernel to be marked as cacheable. The patch aims to solve this.
>
> A cachebility check is made by consulting the VMA pgprot value. If
> the pgprot mapping type is cacheable, it is considered safe to be
> mapped cacheable as the KVM S2 will have the same Normal memory type
> as the VMA has in the S1 and KVM has no additional responsibility
> for safety.
>
> Note when FWB (Force Write Back) is not enabled, the kernel expects to
> trivially do cache management by flushing the memory by linearly
> converting a kvm_pte to phys_addr to a KVA. The cache management thus
> relies on memory being mapped. Since the GPU device memory is not kernel
> mapped, exit when the FWB is not supported. Similarly, ARM64_HAS_CACHE_DIC
> allows KVM to avoid flushing the icache and turns icache_inval_pou() into
> a NOP. So the cacheable PFNMAP is made contingent on these two hardware
> features.
>
> The ability to safely do the cacheable mapping of PFNMAP is exposed
> through a KVM capability for userspace consumption.
>
> The changes are heavily influenced by the discussions among
> maintainers Marc Zyngier and Oliver Upton besides Jason Gunthorpe,
> Catalin Marinas, David Hildenbrand, Sean Christopherson [1]. Many
> thanks for their valuable suggestions.
>
> Applied over next-20250610 and tested on the Grace Blackwell
> platform by booting up VM, loading NVIDIA module [2] and running
> nvidia-smi in the VM.
>
> To run CUDA workloads, there is a dependency on the IOMMUFD and the
> Nested Page Table patches being worked on separately by Nicolin Chen.
> (nicolinc@...dia.com). NVIDIA has provided git repositories which
> includes all the requisite kernel [3] and Qemu [4] patches in case
> one wants to try.
>
> v8 -> v9
> 1. Included MIXEDMAP to also be considered for cacheable mapping.
> (Jason Gunthorpe).
> 2. Minor text nits (Jason Gunthorpe).

A humble reminder for the review.