[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+EHjTx0UkYSduDxe13dFi4+J5L28H+wB4FBXLsMRC5HaHaaFg@mail.gmail.com>
Date: Tue, 8 Jul 2025 17:22:24 +0100
From: Fuad Tabba <tabba@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Vishal Annapurve <vannapurve@...gle.com>, Rick P Edgecombe <rick.p.edgecombe@...el.com>,
"pvorel@...e.cz" <pvorel@...e.cz>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"catalin.marinas@....com" <catalin.marinas@....com>, Jun Miao <jun.miao@...el.com>,
Kirill Shutemov <kirill.shutemov@...el.com>, "pdurrant@...zon.co.uk" <pdurrant@...zon.co.uk>,
"vbabka@...e.cz" <vbabka@...e.cz>, "peterx@...hat.com" <peterx@...hat.com>, "x86@...nel.org" <x86@...nel.org>,
"amoorthy@...gle.com" <amoorthy@...gle.com>, "jack@...e.cz" <jack@...e.cz>,
"quic_svaddagi@...cinc.com" <quic_svaddagi@...cinc.com>, "keirf@...gle.com" <keirf@...gle.com>,
"palmer@...belt.com" <palmer@...belt.com>, "vkuznets@...hat.com" <vkuznets@...hat.com>,
"mail@...iej.szmigiero.name" <mail@...iej.szmigiero.name>,
"anthony.yznaga@...cle.com" <anthony.yznaga@...cle.com>, Wei W Wang <wei.w.wang@...el.com>,
"Wieczor-Retman, Maciej" <maciej.wieczor-retman@...el.com>, Yan Y Zhao <yan.y.zhao@...el.com>,
"ajones@...tanamicro.com" <ajones@...tanamicro.com>, "willy@...radead.org" <willy@...radead.org>,
"rppt@...nel.org" <rppt@...nel.org>, "quic_mnalajal@...cinc.com" <quic_mnalajal@...cinc.com>, "aik@....com" <aik@....com>,
"usama.arif@...edance.com" <usama.arif@...edance.com>, Dave Hansen <dave.hansen@...el.com>,
"fvdl@...gle.com" <fvdl@...gle.com>, "paul.walmsley@...ive.com" <paul.walmsley@...ive.com>,
"bfoster@...hat.com" <bfoster@...hat.com>, "nsaenz@...zon.es" <nsaenz@...zon.es>,
"anup@...infault.org" <anup@...infault.org>, "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"thomas.lendacky@....com" <thomas.lendacky@....com>, "mic@...ikod.net" <mic@...ikod.net>,
"oliver.upton@...ux.dev" <oliver.upton@...ux.dev>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"quic_cvanscha@...cinc.com" <quic_cvanscha@...cinc.com>, "steven.price@....com" <steven.price@....com>,
"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "hughd@...gle.com" <hughd@...gle.com>,
Zhiquan1 Li <zhiquan1.li@...el.com>, "rientjes@...gle.com" <rientjes@...gle.com>,
"mpe@...erman.id.au" <mpe@...erman.id.au>, Erdem Aktas <erdemaktas@...gle.com>,
"david@...hat.com" <david@...hat.com>, "jgg@...pe.ca" <jgg@...pe.ca>,
"jhubbard@...dia.com" <jhubbard@...dia.com>, Haibo1 Xu <haibo1.xu@...el.com>, Fan Du <fan.du@...el.com>,
"maz@...nel.org" <maz@...nel.org>, "muchun.song@...ux.dev" <muchun.song@...ux.dev>,
Isaku Yamahata <isaku.yamahata@...el.com>, "jthoughton@...gle.com" <jthoughton@...gle.com>,
"steven.sistare@...cle.com" <steven.sistare@...cle.com>,
"quic_pheragu@...cinc.com" <quic_pheragu@...cinc.com>, "jarkko@...nel.org" <jarkko@...nel.org>,
"chenhuacai@...nel.org" <chenhuacai@...nel.org>, Kai Huang <kai.huang@...el.com>,
"shuah@...nel.org" <shuah@...nel.org>, "dwmw@...zon.co.uk" <dwmw@...zon.co.uk>,
Chao P Peng <chao.p.peng@...el.com>, "pankaj.gupta@....com" <pankaj.gupta@....com>,
Alexander Graf <graf@...zon.com>, "nikunj@....com" <nikunj@....com>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
"yuzenghui@...wei.com" <yuzenghui@...wei.com>, "jroedel@...e.de" <jroedel@...e.de>,
"suzuki.poulose@....com" <suzuki.poulose@....com>, "jgowans@...zon.com" <jgowans@...zon.com>,
Yilun Xu <yilun.xu@...el.com>, "liam.merwick@...cle.com" <liam.merwick@...cle.com>,
"michael.roth@....com" <michael.roth@....com>, "quic_tsoni@...cinc.com" <quic_tsoni@...cinc.com>,
Xiaoyao Li <xiaoyao.li@...el.com>, "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>,
Ira Weiny <ira.weiny@...el.com>,
"richard.weiyang@...il.com" <richard.weiyang@...il.com>,
"kent.overstreet@...ux.dev" <kent.overstreet@...ux.dev>, "qperret@...gle.com" <qperret@...gle.com>,
"dmatlack@...gle.com" <dmatlack@...gle.com>, "james.morse@....com" <james.morse@....com>,
"brauner@...nel.org" <brauner@...nel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"ackerleytng@...gle.com" <ackerleytng@...gle.com>, "pgonda@...gle.com" <pgonda@...gle.com>,
"quic_pderrin@...cinc.com" <quic_pderrin@...cinc.com>, "hch@...radead.org" <hch@...radead.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>, "will@...nel.org" <will@...nel.org>,
"roypat@...zon.co.uk" <roypat@...zon.co.uk>
Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd
Hi Sean,
On Tue, 8 Jul 2025 at 16:39, Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Tue, Jul 08, 2025, Vishal Annapurve wrote:
> > On Tue, Jul 8, 2025 at 7:52 AM Edgecombe, Rick P
> > <rick.p.edgecombe@...el.com> wrote:
> > >
> > > On Tue, 2025-07-08 at 07:20 -0700, Sean Christopherson wrote:
> > > > > For TDX if we don't zero on conversion from private->shared we will be
> > > > > dependent
> > > > > on behavior of the CPU when reading memory with keyid 0, which was
> > > > > previously
> > > > > encrypted and has some protection bits set. I don't *think* the behavior is
> > > > > architectural. So it might be prudent to either make it so, or zero it in
> > > > > the
> > > > > kernel in order to not make non-architectual behavior into userspace ABI.
> > > >
> > > > Ya, by "vendor specific", I was also lumping in cases where the kernel would
> > > > need to zero memory in order to not end up with effectively undefined
> > > > behavior.
> > >
> > > Yea, more of an answer to Vishal's question about if CC VMs need zeroing. And
> > > the answer is sort of yes, even though TDX doesn't require it. But we actually
> > > don't want to zero memory when reclaiming memory. So TDX KVM code needs to know
> > > that the operation is a to-shared conversion and not another type of private
> > > zap. Like a callback from gmem, or maybe more simply a kernel internal flag to
> > > set in gmem such that it knows it should zero it.
> >
> > If the answer is that "always zero on private to shared conversions"
> > for all CC VMs,
>
> pKVM VMs *are* CoCo VMs. Just because pKVM doesn't rely on third party firmware
> to provide confidentiality and integrity doesn't make it any less of a CoCo VM.
> > > > : And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space to
> > > > : explicitly request that the page range is converted to private and the
> > > > : content needs to be retained. So that TDX can identify which case needs
> > > > : to call in-place TDH.PAGE.ADD.
> > > >
> > > > If so, I agree with that idea, e.g. add a PRESERVE flag or whatever. That way
> > > > userspace has explicit control over what happens to the data during
> > > > conversion,
> > > > and KVM can reject unsupported conversions, e.g. PRESERVE is only allowed for
> > > > shared => private and only for select VM types.
> > >
> > > Ok, we should POC how it works with TDX.
> >
> > I don't think we need a flag to preserve memory as I mentioned in [2]. IIUC,
> > 1) Conversions are always content-preserving for pKVM.
>
> No? Perserving contents on private => shared is a security vulnerability waiting
> to happen.
Actually it is one of the requirements for pKVM as well as its current
behavior. We would like to preserve contents both ways, private <=>
shared, since it is required by some of the potential use cases (e.g.,
guest handling video encoding/decoding).
To make it clear, I'm talking about explicit sharing from the guest,
not relinquishing memory back to the host. In the case of
relinquishing (and guest teardown), relinquished memory is poisoned
(zeroed) in pKVM.
Cheers,
/fuad
> > 2) Shared to private conversions are always content-preserving for all
> > VMs as far as guest_memfd is concerned.
>
> There is no "as far as guest_memfd is concerned". Userspace doesn't care whether
> code lives in guest_memfd.c versus arch/xxx/kvm, the only thing that matters is
> the behavior that userspace sees. I don't want to end up with userspace ABI that
> is vendor/VM specific.
>
> > 3) Private to shared conversions are not content-preserving for CC VMs
> > as far as guest_memfd is concerned, subject to more discussions.
> >
> > [2] https://lore.kernel.org/lkml/CAGtprH-Kzn2kOGZ4JuNtUT53Hugw64M-_XMmhz_gCiDS6BAFtQ@mail.gmail.com/
Powered by blists - more mailing lists