linux-kernel - RE: [PATCH v3 00/14] arm64: Support for running as a guest in Arm CCA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157B95E74A3865C44F6132ED4CD2@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Mon, 17 Jun 2024 04:06:15 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Catalin Marinas <catalin.marinas@....com>
CC: Steven Price <steven.price@....com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>,
	Marc Zyngier <maz@...nel.org>, Will Deacon <will@...nel.org>, James Morse
	<james.morse@....com>, Oliver Upton <oliver.upton@...ux.dev>, Suzuki K
 Poulose <suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Joey Gouly <joey.gouly@....com>, Alexandru
 Elisei <alexandru.elisei@....com>, Christoffer Dall
	<christoffer.dall@....com>, Fuad Tabba <tabba@...gle.com>,
	"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>, Ganapatrao
 Kulkarni <gankulkarni@...amperecomputing.com>
Subject: RE: [PATCH v3 00/14] arm64: Support for running as a guest in Arm CCA

From: Catalin Marinas <catalin.marinas@....com> Sent: Monday, June 10, 2024 10:46 AM
> 
> On Mon, Jun 10, 2024 at 05:03:44PM +0000, Michael Kelley wrote:
> > From: Catalin Marinas <catalin.marinas@....com> Sent: Monday, June 10, 2024 3:34 AM
> > > I wonder whether something like __GFP_DECRYPTED could be used to get
> > > shared memory from the allocation time and avoid having to change the
> > > vmalloc() ranges. This way functions like netvsc_init_buf() would get
> > > decrypted memory from the start and vmbus_establish_gpadl() would not
> > > need to call set_memory_decrypted() on a vmalloc() address.
> >
> > I would not have any conceptual objections to such an approach. But I'm
> > certainly not an expert in that area so I'm not sure what it would take
> > to make that work for vmalloc(). I presume that __GFP_DECRYPTED
> > should also work for kmalloc()?
> >
> > I've seen the separate discussion about a designated pool of decrypted
> > memory, to avoid always allocating a new page and decrypting when a
> > smaller allocation is sufficient. If such a pool could also work for page size
> > or larger allocations, it would have the additional benefit of concentrating
> > decrypted allocations in fewer 2 Meg large pages vs. scattering wherever
> > and forcing the break-up of more large page mappings in the direct map.
> 
> Yeah, my quick, not fully tested hack here:
> 
> https://lore.kernel.org/linux-arm-kernel/ZmNJdSxSz-sYpVgI@arm.com/ 
> 
> It's the underlying page allocator that gives back decrypted pages when
> the flag is passed, so it should work for alloc_pages() and friends. The
> kmalloc() changes only ensure that we have separate caches for this
> memory and they are not merged. It needs some more work on kmem_cache,
> maybe introducing a SLAB_DECRYPTED flag as well as not to rely on the
> GFP flag.
> 
> For vmalloc(), we'd need a pgprot_decrypted() macro to ensure the
> decrypted pages are marked with the appropriate attributes (arch
> specific), otherwise it's fairly easy to wire up if alloc_pages() gives
> back decrypted memory.
> 
> > I'll note that netvsc devices can be added or removed from a running VM.
> > The vmalloc() memory allocated by netvsc_init_buf() can be freed, and/or
> > additional calls to netvsc_init_buf() can be made at any time -- they aren't
> > limited to initial Linux boot.  So the mechanism for getting decrypted
> > memory at allocation time must be reasonably dynamic.
> 
> I think the above should work. But, of course, we'd have to get this
> past the mm maintainers, it's likely that I missed something.

Having thought about this a few days, I like the model of telling the
memory allocators to decrypt/re-encrypt the memory, instead of the
caller having to explicitly do set_memory_decrypted()/encrypted().
I'll add some further comments to the thread with your initial
implementation.

> 
> > Rejecting vmalloc() addresses may work for the moment -- I don't know
> > when CCA guests might be tried on Hyper-V.  The original SEV-SNP and TDX
> > work started that way as well. :-) Handling the vmalloc() case was added
> > later, though I think on x86 the machinery to also flip all the alias PTEs was
> > already mostly or completely in place, probably for other reasons. So
> > fixing the vmalloc() case was more about not assuming that the underlying
> > physical address range is contiguous. Instead, each page must be processed
> > independently, which was straightforward.
> 
> There may be a slight performance impact but I guess that's not on a
> critical path. Walking the page tables and changing the vmalloc ptes
> should be fine but for each page, we'd have to break the linear map,
> flush the TLBs, re-create the linear map. Those TLBs may become a
> bottleneck, especially on hardware with lots of CPUs and the
> microarchitecture. Note that even with a __GFP_DECRYPTED attribute, we'd
> still need to go for individual pages in the linear map.

Agreed. While synthetic devices can come-and-go anytime, it's pretty
rare in the grand scheme of things. I guess we would have to try it on
a system with high CPU count, but even if the code needed to "pace
itself" to avoid hammering the TLBs too hard, that would be OK.

Michael