[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGtprH8ozWpFLa2TSRLci-SgXRfJxcW7BsJSYOxa4Lgud+76qQ@mail.gmail.com>
Date: Tue, 8 Jul 2025 08:07:21 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
Cc: "seanjc@...gle.com" <seanjc@...gle.com>, "pvorel@...e.cz" <pvorel@...e.cz>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "catalin.marinas@....com" <catalin.marinas@....com>,
"Miao, Jun" <jun.miao@...el.com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
"pdurrant@...zon.co.uk" <pdurrant@...zon.co.uk>, "vbabka@...e.cz" <vbabka@...e.cz>,
"peterx@...hat.com" <peterx@...hat.com>, "x86@...nel.org" <x86@...nel.org>,
"amoorthy@...gle.com" <amoorthy@...gle.com>, "jack@...e.cz" <jack@...e.cz>,
"quic_svaddagi@...cinc.com" <quic_svaddagi@...cinc.com>, "keirf@...gle.com" <keirf@...gle.com>,
"palmer@...belt.com" <palmer@...belt.com>, "vkuznets@...hat.com" <vkuznets@...hat.com>,
"mail@...iej.szmigiero.name" <mail@...iej.szmigiero.name>,
"anthony.yznaga@...cle.com" <anthony.yznaga@...cle.com>, "Wang, Wei W" <wei.w.wang@...el.com>,
"tabba@...gle.com" <tabba@...gle.com>,
"Wieczor-Retman, Maciej" <maciej.wieczor-retman@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
"ajones@...tanamicro.com" <ajones@...tanamicro.com>, "willy@...radead.org" <willy@...radead.org>,
"rppt@...nel.org" <rppt@...nel.org>, "quic_mnalajal@...cinc.com" <quic_mnalajal@...cinc.com>, "aik@....com" <aik@....com>,
"usama.arif@...edance.com" <usama.arif@...edance.com>, "Hansen, Dave" <dave.hansen@...el.com>,
"fvdl@...gle.com" <fvdl@...gle.com>, "paul.walmsley@...ive.com" <paul.walmsley@...ive.com>,
"bfoster@...hat.com" <bfoster@...hat.com>, "nsaenz@...zon.es" <nsaenz@...zon.es>,
"anup@...infault.org" <anup@...infault.org>, "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"thomas.lendacky@....com" <thomas.lendacky@....com>, "mic@...ikod.net" <mic@...ikod.net>,
"oliver.upton@...ux.dev" <oliver.upton@...ux.dev>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"quic_cvanscha@...cinc.com" <quic_cvanscha@...cinc.com>, "steven.price@....com" <steven.price@....com>,
"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "hughd@...gle.com" <hughd@...gle.com>,
"Li, Zhiquan1" <zhiquan1.li@...el.com>, "rientjes@...gle.com" <rientjes@...gle.com>,
"mpe@...erman.id.au" <mpe@...erman.id.au>, "Aktas, Erdem" <erdemaktas@...gle.com>,
"david@...hat.com" <david@...hat.com>, "jgg@...pe.ca" <jgg@...pe.ca>,
"jhubbard@...dia.com" <jhubbard@...dia.com>, "Xu, Haibo1" <haibo1.xu@...el.com>, "Du, Fan" <fan.du@...el.com>,
"maz@...nel.org" <maz@...nel.org>, "muchun.song@...ux.dev" <muchun.song@...ux.dev>,
"Yamahata, Isaku" <isaku.yamahata@...el.com>, "jthoughton@...gle.com" <jthoughton@...gle.com>,
"steven.sistare@...cle.com" <steven.sistare@...cle.com>,
"quic_pheragu@...cinc.com" <quic_pheragu@...cinc.com>, "jarkko@...nel.org" <jarkko@...nel.org>,
"chenhuacai@...nel.org" <chenhuacai@...nel.org>, "Huang, Kai" <kai.huang@...el.com>,
"shuah@...nel.org" <shuah@...nel.org>, "dwmw@...zon.co.uk" <dwmw@...zon.co.uk>,
"Peng, Chao P" <chao.p.peng@...el.com>, "pankaj.gupta@....com" <pankaj.gupta@....com>,
"Graf, Alexander" <graf@...zon.com>, "nikunj@....com" <nikunj@....com>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
"yuzenghui@...wei.com" <yuzenghui@...wei.com>, "jroedel@...e.de" <jroedel@...e.de>,
"suzuki.poulose@....com" <suzuki.poulose@....com>, "jgowans@...zon.com" <jgowans@...zon.com>,
"Xu, Yilun" <yilun.xu@...el.com>, "liam.merwick@...cle.com" <liam.merwick@...cle.com>,
"michael.roth@....com" <michael.roth@....com>, "quic_tsoni@...cinc.com" <quic_tsoni@...cinc.com>,
"Li, Xiaoyao" <xiaoyao.li@...el.com>, "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>,
"Weiny, Ira" <ira.weiny@...el.com>,
"richard.weiyang@...il.com" <richard.weiyang@...il.com>,
"kent.overstreet@...ux.dev" <kent.overstreet@...ux.dev>, "qperret@...gle.com" <qperret@...gle.com>,
"dmatlack@...gle.com" <dmatlack@...gle.com>, "james.morse@....com" <james.morse@....com>,
"brauner@...nel.org" <brauner@...nel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"ackerleytng@...gle.com" <ackerleytng@...gle.com>, "pgonda@...gle.com" <pgonda@...gle.com>,
"quic_pderrin@...cinc.com" <quic_pderrin@...cinc.com>, "hch@...radead.org" <hch@...radead.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>, "will@...nel.org" <will@...nel.org>,
"roypat@...zon.co.uk" <roypat@...zon.co.uk>
Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd
On Tue, Jul 8, 2025 at 7:52 AM Edgecombe, Rick P
<rick.p.edgecombe@...el.com> wrote:
>
> On Tue, 2025-07-08 at 07:20 -0700, Sean Christopherson wrote:
> > > For TDX if we don't zero on conversion from private->shared we will be
> > > dependent
> > > on behavior of the CPU when reading memory with keyid 0, which was
> > > previously
> > > encrypted and has some protection bits set. I don't *think* the behavior is
> > > architectural. So it might be prudent to either make it so, or zero it in
> > > the
> > > kernel in order to not make non-architectual behavior into userspace ABI.
> >
> > Ya, by "vendor specific", I was also lumping in cases where the kernel would
> > need to zero memory in order to not end up with effectively undefined
> > behavior.
>
> Yea, more of an answer to Vishal's question about if CC VMs need zeroing. And
> the answer is sort of yes, even though TDX doesn't require it. But we actually
> don't want to zero memory when reclaiming memory. So TDX KVM code needs to know
> that the operation is a to-shared conversion and not another type of private
> zap. Like a callback from gmem, or maybe more simply a kernel internal flag to
> set in gmem such that it knows it should zero it.
If the answer is that "always zero on private to shared conversions"
for all CC VMs, then does the scheme outlined in [1] make sense for
handling the private -> shared conversions? For pKVM, there can be a
VM type check to avoid the zeroing during conversions and instead just
zero on allocations. This allows delaying zeroing until the fault time
for CC VMs and can be done in guest_memfd centrally. We will need more
inputs from the SEV side for this discussion.
[1] https://lore.kernel.org/lkml/CAGtprH-83EOz8rrUjE+O8m7nUDjt=THyXx=kfft1xQry65mtQg@mail.gmail.com/
>
> >
> > > Up the thread Vishal says we need to support operations that use in-place
> > > conversion (overloaded term now I think, btw). Why exactly is pKVM using
> > > private/shared conversion for this private data provisioning?
> >
> > Because it's literally converting memory from shared to private? And IICU,
> > it's
> > not a one-time provisioning, e.g. memory can go:
> >
> > shared => fill => private => consume => shared => fill => private => consume
> >
> > > Instead of a special provisioning operation like the others? (Xiaoyao's
> > > suggestion)
> >
> > Are you referring to this suggestion?
>
> Yea, in general to make it a specific operation preserving operation.
>
> >
> > : And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space to
> > : explicitly request that the page range is converted to private and the
> > : content needs to be retained. So that TDX can identify which case needs
> > : to call in-place TDH.PAGE.ADD.
> >
> > If so, I agree with that idea, e.g. add a PRESERVE flag or whatever. That way
> > userspace has explicit control over what happens to the data during
> > conversion,
> > and KVM can reject unsupported conversions, e.g. PRESERVE is only allowed for
> > shared => private and only for select VM types.
>
> Ok, we should POC how it works with TDX.
I don't think we need a flag to preserve memory as I mentioned in [2]. IIUC,
1) Conversions are always content-preserving for pKVM.
2) Shared to private conversions are always content-preserving for all
VMs as far as guest_memfd is concerned.
3) Private to shared conversions are not content-preserving for CC VMs
as far as guest_memfd is concerned, subject to more discussions.
[2] https://lore.kernel.org/lkml/CAGtprH-Kzn2kOGZ4JuNtUT53Hugw64M-_XMmhz_gCiDS6BAFtQ@mail.gmail.com/
Powered by blists - more mailing lists