linux-kernel - Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+EHjTxvOyCqWRMTS3mXHznQtAJzDJLgqdS0Er2GA9FGdxd1vA@mail.gmail.com>
Date: Fri, 21 Jun 2024 09:23:41 +0100
From: Fuad Tabba <tabba@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, David Hildenbrand <david@...hat.com>, John Hubbard <jhubbard@...dia.com>, 
	Elliot Berman <quic_eberman@...cinc.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Shuah Khan <shuah@...nel.org>, Matthew Wilcox <willy@...radead.org>, maz@...nel.org, 
	kvm@...r.kernel.org, linux-arm-msm@...r.kernel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	pbonzini@...hat.com
Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning

Hi Sean,

On Thu, Jun 20, 2024 at 4:37 PM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Wed, Jun 19, 2024, Fuad Tabba wrote:
> > Hi Jason,
> >
> > On Wed, Jun 19, 2024 at 12:51 PM Jason Gunthorpe <jgg@...dia.com> wrote:
> > >
> > > On Wed, Jun 19, 2024 at 10:11:35AM +0100, Fuad Tabba wrote:
> > >
> > > > To be honest, personally (speaking only for myself, not necessarily
> > > > for Elliot and not for anyone else in the pKVM team), I still would
> > > > prefer to use guest_memfd(). I think that having one solution for
> > > > confidential computing that rules them all would be best. But we do
> > > > need to be able to share memory in place, have a plan for supporting
> > > > huge pages in the near future, and migration in the not-too-distant
> > > > future.
> > >
> > > I think using a FD to control this special lifetime stuff is
> > > dramatically better than trying to force the MM to do it with struct
> > > page hacks.
> > >
> > > If you can't agree with the guest_memfd people on how to get there
> > > then maybe you need a guest_memfd2 for this slightly different special
> > > stuff instead of intruding on the core mm so much. (though that would
> > > be sad)
> > >
> > > We really need to be thinking more about containing these special
> > > things and not just sprinkling them everywhere.
> >
> > I agree that we need to agree :) This discussion has been going on
> > since before LPC last year, and the consensus from the guest_memfd()
> > folks (if I understood it correctly) is that guest_memfd() is what it
> > is: designed for a specific type of confidential computing, in the
> > style of TDX and CCA perhaps, and that it cannot (or will not) perform
> > the role of being a general solution for all confidential computing.
>
> That isn't remotely accurate.  I have stated multiple times that I want guest_memfd
> to be a vehicle for all VM types, i.e. not just CoCo VMs, and most definitely not
> just TDX/SNP/CCA VMs.

I think that there might have been a slight misunderstanding between
us. I just thought that that's what you meant by:

: And I'm saying say we should stand firm in what guest_memfd _won't_
support, e.g.
: swap/reclaim and probably page migration should get a hard "no".

https://lore.kernel.org/all/Zfmpby6i3PfBEcCV@google.com/

> What I am staunchly against is piling features onto guest_memfd that will cause
> it to eventually become virtually indistinguishable from any other file-based
> backing store.  I.e. while I want to make guest_memfd usable for all VM *types*,
> making guest_memfd the preferred backing store for all *VMs* and use cases is
> very much a non-goal.
>
> From an earlier conversation[1]:
>
>  : In other words, ditch the complexity for features that are well served by existing
>  : general purpose solutions, so that guest_memfd can take on a bit of complexity to
>  : serve use cases that are unique to KVM guests, without becoming an unmaintainble
>  : mess due to cross-products.
> > > > Also, since pin is already overloading the refcount, having the
> > > > exclusive pin there helps in ensuring atomic accesses and avoiding
> > > > races.
> > >
> > > Yeah, but every time someone does this and then links it to a uAPI it
> > > becomes utterly baked in concrete for the MM forever.
> >
> > I agree. But if we can't modify guest_memfd() to fit our needs (pKVM,
> > Gunyah), then we don't really have that many other options.
>
> What _are_ your needs?  There are multiple unanswered questions from our last
> conversation[2].  And by "needs" I don't mean "what changes do you want to make
> to guest_memfd?", I mean "what are the use cases, patterns, and scenarios that
> you want to support?".

I think Quentin's reply in this thread outlines what it is pKVM would
like to do, and why it's different from, e.g., TDX:
https://lore.kernel.org/all/ZnUsmFFslBWZxGIq@google.com/

To summarize, our requirements are the same as other CC
implementations, except that we don't want to pay a penalty for
operations that pKVM (and Gunyah) can do more efficiently than
encryption-based CC, e.g., in-place conversion of private -> shared.

Apart from that, we are happy to use an interface that can support our
needs, or at least that we can extend in the (near) future to do that.
Whether it's guest_memfd() or something else.

>  : What's "hypervisor-assisted page migration"?  More specifically, what's the
>  : mechanism that drives it?

I believe what Will specifically meant by this is that, we can add
hypervisor support for migration in pKVM for the stage 2 page tables.

We don't have a detailed implementation for this yet, of course, since
there's no point yet until we know whether we're going with
guest_memfd(), or another alternative.

>  : Do you happen to have a list of exactly what you mean by "normal mm stuff"?  I
>  : am not at all opposed to supporting .mmap(), because long term I also want to
>  : use guest_memfd for non-CoCo VMs.  But I want to be very conservative with respect
>  : to what is allowed for guest_memfd.   E.g. host userspace can map guest_memfd,
>  : and do operations that are directly related to its mapping, but that's about it.
>
> That distinction matters, because as I have stated in that thread, I am not
> opposed to page migration itself:
>
>  : I am not opposed to page migration itself, what I am opposed to is adding deep
>  : integration with core MM to do some of the fancy/complex things that lead to page
>  : migration.

So it's not a "hard no"? :)

> I am generally aware of the core pKVM use cases, but I AFAIK I haven't seen a
> complete picture of everything you want to do, and _why_.
> E.g. if one of your requirements is that guest memory is managed by core-mm the
> same as all other memory in the system, then yeah, guest_memfd isn't for you.
> Integrating guest_memfd deeply into core-mm simply isn't realistic, at least not
> without *massive* changes to core-mm, as the whole point of guest_memfd is that
> it is guest-first memory, i.e. it is NOT memory that is managed by core-mm (primary
> MMU) and optionally mapped into KVM (secondary MMU).

It's not a requirement that guest memory is managed by the core-mm.
But, like we mentioned, support for in-place conversion from
shared->private, huge pages, and eventually migration are.

> Again from that thread, one of most important aspects guest_memfd is that VMAs
> are not required.  Stating the obvious, lack of VMAs makes it really hard to drive
> swap, reclaim, migration, etc. from code that fundamentally operates on VMAs.
>
>  : More broadly, no VMAs are required.  The lack of stage-1 page tables are nice to
>  : have; the lack of VMAs means that guest_memfd isn't playing second fiddle, e.g.
>  : it's not subject to VMA protections, isn't restricted to host mapping size, etc.
>
> [1] https://lore.kernel.org/all/Zfmpby6i3PfBEcCV@google.com
> [2] https://lore.kernel.org/all/Zg3xF7dTtx6hbmZj@google.com

I wonder if it might be more productive to also discuss this in one of
the PUCKs, ahead of LPC, in addition to trying to go over this in LPC.

Cheers,
/fuad