linux-kernel - Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZnTBGCeSN1u6wzLb@google.com>
Date: Thu, 20 Jun 2024 16:54:00 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: David Hildenbrand <david@...hat.com>, Fuad Tabba <tabba@...gle.com>, 
	Christoph Hellwig <hch@...radead.org>, John Hubbard <jhubbard@...dia.com>, 
	Elliot Berman <quic_eberman@...cinc.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Shuah Khan <shuah@...nel.org>, Matthew Wilcox <willy@...radead.org>, maz@...nel.org, 
	kvm@...r.kernel.org, linux-arm-msm@...r.kernel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	pbonzini@...hat.com
Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning

On Thu, Jun 20, 2024, Jason Gunthorpe wrote:
> On Thu, Jun 20, 2024 at 01:30:29PM -0700, Sean Christopherson wrote:
> > I.e. except for blatant bugs, e.g. use-after-free, we need to be able to guarantee
> > with 100% accuracy that there are no outstanding mappings when converting a page
> > from shared=>private.  Crossing our fingers and hoping that short-term GUP will
> > have gone away isn't enough.
> 
> To be clear it is not crossing fingers. If the page refcount is 0 then
> there are no references to that memory anywhere at all. It is 100%
> certain.
> 
> It may take time to reach zero, but when it does it is safe.

Yeah, we're on the same page, I just didn't catch the implicit (or maybe it was
explicitly stated earlier) "wait for the refcount to hit zero" part that David
already clarified.

> Many things rely on this property, including FSDAX.
> 
> > For non-CoCo VMs, I expect we'll want to be much more permissive, but I think
> > they'll be a complete non-issue because there is no shared vs. private to worry
> > about.  We can simply allow any and all userspace mappings for guest_memfd that is
> > attached to a "regular" VM, because a misbehaving userspace only loses whatever
> > hardening (or other benefits) was being provided by using guest_memfd.  I.e. the
> > kernel and system at-large isn't at risk.
> 
> It does seem to me like guest_memfd should really focus on the private
> aspect.
> 
> If we need normal memfd enhancements of some kind to work better with
> KVM then that may be a better option than turning guest_memfd into
> memfd.

Heh, and then we'd end up turning memfd into guest_memfd.  As I see it, being
able to safely map TDX/SNP/pKVM private memory is a happy side effect that is
possible because guest_memfd isn't subordinate to the primary MMU, but private
memory isn't the core idenity of guest_memfd.

The thing that makes guest_memfd tick is that it's guest-first, i.e. allows mapping
memory into the guest with more permissions/capabilities than the host.  E.g. access
to private memory, hugepage mappings when the host is forced to use small pages,
RWX mappings when the host is limited to RO, etc.

We could do a subset of those for memfd, but I don't see the point, assuming we
allow mmap() on shared guest_memfd memory.  Solving mmap() for VMs that do
private<=>shared conversions is the hard problem to solve.  Once that's done,
we'll get support for regular VMs along with the other benefits of guest_memfd
for free (or very close to free).