lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eeb8f4b8308b5160f913294c4373290a64e736b8.camel@intel.com>
Date: Tue, 8 Jul 2025 15:31:11 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "Annapurve, Vishal" <vannapurve@...gle.com>
CC: "pvorel@...e.cz" <pvorel@...e.cz>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "catalin.marinas@....com" <catalin.marinas@....com>,
	"Miao, Jun" <jun.miao@...el.com>, "palmer@...belt.com" <palmer@...belt.com>,
	"pdurrant@...zon.co.uk" <pdurrant@...zon.co.uk>, "steven.price@....com"
	<steven.price@....com>, "peterx@...hat.com" <peterx@...hat.com>,
	"x86@...nel.org" <x86@...nel.org>, "amoorthy@...gle.com"
	<amoorthy@...gle.com>, "tabba@...gle.com" <tabba@...gle.com>,
	"quic_svaddagi@...cinc.com" <quic_svaddagi@...cinc.com>, "jack@...e.cz"
	<jack@...e.cz>, "vkuznets@...hat.com" <vkuznets@...hat.com>,
	"quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "keirf@...gle.com"
	<keirf@...gle.com>, "mail@...iej.szmigiero.name"
	<mail@...iej.szmigiero.name>, "anthony.yznaga@...cle.com"
	<anthony.yznaga@...cle.com>, "Wang, Wei W" <wei.w.wang@...el.com>,
	"rppt@...nel.org" <rppt@...nel.org>, "Wieczor-Retman, Maciej"
	<maciej.wieczor-retman@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
	"ajones@...tanamicro.com" <ajones@...tanamicro.com>, "Hansen, Dave"
	<dave.hansen@...el.com>, "paul.walmsley@...ive.com"
	<paul.walmsley@...ive.com>, "quic_mnalajal@...cinc.com"
	<quic_mnalajal@...cinc.com>, "aik@....com" <aik@....com>,
	"usama.arif@...edance.com" <usama.arif@...edance.com>, "fvdl@...gle.com"
	<fvdl@...gle.com>, "quic_cvanscha@...cinc.com" <quic_cvanscha@...cinc.com>,
	"Shutemov, Kirill" <kirill.shutemov@...el.com>, "vbabka@...e.cz"
	<vbabka@...e.cz>, "anup@...infault.org" <anup@...infault.org>,
	"thomas.lendacky@....com" <thomas.lendacky@....com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"mic@...ikod.net" <mic@...ikod.net>, "oliver.upton@...ux.dev"
	<oliver.upton@...ux.dev>, "Du, Fan" <fan.du@...el.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"muchun.song@...ux.dev" <muchun.song@...ux.dev>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
	"rientjes@...gle.com" <rientjes@...gle.com>, "mpe@...erman.id.au"
	<mpe@...erman.id.au>, "Aktas, Erdem" <erdemaktas@...gle.com>,
	"david@...hat.com" <david@...hat.com>, "jgg@...pe.ca" <jgg@...pe.ca>,
	"willy@...radead.org" <willy@...radead.org>, "hughd@...gle.com"
	<hughd@...gle.com>, "Xu, Haibo1" <haibo1.xu@...el.com>, "jhubbard@...dia.com"
	<jhubbard@...dia.com>, "maz@...nel.org" <maz@...nel.org>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "jthoughton@...gle.com" <jthoughton@...gle.com>,
	"will@...nel.org" <will@...nel.org>, "steven.sistare@...cle.com"
	<steven.sistare@...cle.com>, "jarkko@...nel.org" <jarkko@...nel.org>,
	"quic_pheragu@...cinc.com" <quic_pheragu@...cinc.com>, "nsaenz@...zon.es"
	<nsaenz@...zon.es>, "chenhuacai@...nel.org" <chenhuacai@...nel.org>, "Huang,
 Kai" <kai.huang@...el.com>, "shuah@...nel.org" <shuah@...nel.org>,
	"bfoster@...hat.com" <bfoster@...hat.com>, "dwmw@...zon.co.uk"
	<dwmw@...zon.co.uk>, "Peng, Chao P" <chao.p.peng@...el.com>,
	"pankaj.gupta@....com" <pankaj.gupta@....com>, "Graf, Alexander"
	<graf@...zon.com>, "nikunj@....com" <nikunj@....com>,
	"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "yuzenghui@...wei.com" <yuzenghui@...wei.com>,
	"jroedel@...e.de" <jroedel@...e.de>, "suzuki.poulose@....com"
	<suzuki.poulose@....com>, "jgowans@...zon.com" <jgowans@...zon.com>, "Xu,
 Yilun" <yilun.xu@...el.com>, "liam.merwick@...cle.com"
	<liam.merwick@...cle.com>, "michael.roth@....com" <michael.roth@....com>,
	"quic_tsoni@...cinc.com" <quic_tsoni@...cinc.com>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>,
	"Weiny, Ira" <ira.weiny@...el.com>, "richard.weiyang@...il.com"
	<richard.weiyang@...il.com>, "kent.overstreet@...ux.dev"
	<kent.overstreet@...ux.dev>, "qperret@...gle.com" <qperret@...gle.com>,
	"dmatlack@...gle.com" <dmatlack@...gle.com>, "james.morse@....com"
	<james.morse@....com>, "brauner@...nel.org" <brauner@...nel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"ackerleytng@...gle.com" <ackerleytng@...gle.com>, "pgonda@...gle.com"
	<pgonda@...gle.com>, "quic_pderrin@...cinc.com" <quic_pderrin@...cinc.com>,
	"hch@...radead.org" <hch@...radead.org>, "linux-mm@...ck.org"
	<linux-mm@...ck.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"roypat@...zon.co.uk" <roypat@...zon.co.uk>
Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd

On Tue, 2025-07-08 at 08:07 -0700, Vishal Annapurve wrote:
> On Tue, Jul 8, 2025 at 7:52 AM Edgecombe, Rick P
> <rick.p.edgecombe@...el.com> wrote:
> > 
> > On Tue, 2025-07-08 at 07:20 -0700, Sean Christopherson wrote:
> > > > For TDX if we don't zero on conversion from private->shared we will be
> > > > dependent
> > > > on behavior of the CPU when reading memory with keyid 0, which was
> > > > previously
> > > > encrypted and has some protection bits set. I don't *think* the behavior is
> > > > architectural. So it might be prudent to either make it so, or zero it in
> > > > the
> > > > kernel in order to not make non-architectual behavior into userspace ABI.
> > > 
> > > Ya, by "vendor specific", I was also lumping in cases where the kernel would
> > > need to zero memory in order to not end up with effectively undefined
> > > behavior.
> > 
> > Yea, more of an answer to Vishal's question about if CC VMs need zeroing. And
> > the answer is sort of yes, even though TDX doesn't require it. But we actually
> > don't want to zero memory when reclaiming memory. So TDX KVM code needs to know
> > that the operation is a to-shared conversion and not another type of private
> > zap. Like a callback from gmem, or maybe more simply a kernel internal flag to
> > set in gmem such that it knows it should zero it.
> 
> If the answer is that "always zero on private to shared conversions"
> for all CC VMs, then does the scheme outlined in [1] make sense for
> handling the private -> shared conversions? For pKVM, there can be a
> VM type check to avoid the zeroing during conversions and instead just
> zero on allocations. This allows delaying zeroing until the fault time
> for CC VMs and can be done in guest_memfd centrally. We will need more
> inputs from the SEV side for this discussion.
> 
> [1] https://lore.kernel.org/lkml/CAGtprH-83EOz8rrUjE+O8m7nUDjt=THyXx=kfft1xQry65mtQg@mail.gmail.com/

It's nice that we don't double zero (since TDX module will do it too) for
private allocation/mapping. Seems ok to me.

> 
> > 
> > > 
> > > > Up the thread Vishal says we need to support operations that use in-place
> > > > conversion (overloaded term now I think, btw). Why exactly is pKVM using
> > > > private/shared conversion for this private data provisioning?
> > > 
> > > Because it's literally converting memory from shared to private?  And IICU,
> > > it's
> > > not a one-time provisioning, e.g. memory can go:
> > > 
> > >   shared => fill => private => consume => shared => fill => private => consume
> > > 
> > > > Instead of a special provisioning operation like the others? (Xiaoyao's
> > > > suggestion)
> > > 
> > > Are you referring to this suggestion?
> > 
> > Yea, in general to make it a specific operation preserving operation.
> > 
> > > 
> > >  : And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space to
> > >  : explicitly request that the page range is converted to private and the
> > >  : content needs to be retained. So that TDX can identify which case needs
> > >  : to call in-place TDH.PAGE.ADD.
> > > 
> > > If so, I agree with that idea, e.g. add a PRESERVE flag or whatever.  That way
> > > userspace has explicit control over what happens to the data during
> > > conversion,
> > > and KVM can reject unsupported conversions, e.g. PRESERVE is only allowed for
> > > shared => private and only for select VM types.
> > 
> > Ok, we should POC how it works with TDX.
> 
> I don't think we need a flag to preserve memory as I mentioned in [2]. IIUC,
> 1) Conversions are always content-preserving for pKVM.
> 2) Shared to private conversions are always content-preserving for all
> VMs as far as guest_memfd is concerned.
> 3) Private to shared conversions are not content-preserving for CC VMs
> as far as guest_memfd is concerned, subject to more discussions.
> 
> [2] https://lore.kernel.org/lkml/CAGtprH-Kzn2kOGZ4JuNtUT53Hugw64M-_XMmhz_gCiDS6BAFtQ@mail.gmail.com/

Right, I read that. I still don't see why pKVM needs to do normal private/shared
conversion for data provisioning. Vs a dedicated operation/flag to make it a
special case.

I'm trying to suggest there could be a benefit to making all gmem VM types
behave the same. If conversions are always content preserving for pKVM, why
can't userspace  always use the operation that says preserve content? Vs
changing the behavior of the common operations?

So for all VM types, the user ABI would be:
private->shared          - Always zero's page
shared->private          - Always destructive
shared->private (w/flag) - Always preserves data or return error if not possible


Do you see a problem?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ