linux-kernel - Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240621123916.GQ2494510@nvidia.com>
Date: Fri, 21 Jun 2024 09:39:16 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: David Hildenbrand <david@...hat.com>, Fuad Tabba <tabba@...gle.com>,
	Christoph Hellwig <hch@...radead.org>,
	John Hubbard <jhubbard@...dia.com>,
	Elliot Berman <quic_eberman@...cinc.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Shuah Khan <shuah@...nel.org>, Matthew Wilcox <willy@...radead.org>,
	maz@...nel.org, kvm@...r.kernel.org, linux-arm-msm@...r.kernel.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	linux-kselftest@...r.kernel.org, pbonzini@...hat.com
Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning

On Thu, Jun 20, 2024 at 04:54:00PM -0700, Sean Christopherson wrote:

> Heh, and then we'd end up turning memfd into guest_memfd.  As I see it, being
> able to safely map TDX/SNP/pKVM private memory is a happy side effect that is
> possible because guest_memfd isn't subordinate to the primary MMU, but private
> memory isn't the core idenity of guest_memfd.

IMHO guest memfd still has a very bright line between it and normal
memfd.

guest mfd is holding all the memory and making it unmovable because it
has donated it to some secure world. Unmovable means the mm can't do
anything with it in normal ways. For things like David's 'b' where we
fragement the pages it also requires guest memfd act as an allocator
and completely own the PFNs, including handling free callbacks like
ZONE_DEVICE does.

memfd on the other hand should always be normal movable allocated
kernel memory with full normal folios and it shouldn't act as an
allocator.

Teaching memfd to hold a huge folio is probably going to be a
different approach than teaching guest memfd, I suspect "a" would be a
more suitable choice there. You give up the kvm side contiguity, but
get full mm integration of the memory.

User gets to choose which is more important..

It is not that different than today where VMMs are using hugetlbfs to
get unmovable memory.

> We could do a subset of those for memfd, but I don't see the point, assuming we
> allow mmap() on shared guest_memfd memory.  Solving mmap() for VMs that do
> private<=>shared conversions is the hard problem to solve.  Once that's done,
> we'll get support for regular VMs along with the other benefits of guest_memfd
> for free (or very close to free).

Yes, but I get the feeling that even in the best case for guest memfd
you still end up with the non-movable memory and less mm features
available.

Like if we do movability in a guest memfd space it would have be with
some op callback to move the memory via the secure world, and guest
memfd would still be pinning all the memory. Quite a different flow
than what memfd should do.

There may still be merit in teaching memfd how to do huge pages too,
though I don't really know.

Jason