[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGtprH-fE=G923ctBAcq5zFna+2WULhmHDSbXUsZKUrin29b4g@mail.gmail.com>
Date: Wed, 21 May 2025 07:42:01 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: Fuad Tabba <tabba@...gle.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, kvm@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, x86@...nel.org, linux-fsdevel@...r.kernel.org,
aik@....com, ajones@...tanamicro.com, akpm@...ux-foundation.org,
amoorthy@...gle.com, anthony.yznaga@...cle.com, anup@...infault.org,
aou@...s.berkeley.edu, bfoster@...hat.com, binbin.wu@...ux.intel.com,
brauner@...nel.org, catalin.marinas@....com, chao.p.peng@...el.com,
chenhuacai@...nel.org, dave.hansen@...el.com, david@...hat.com,
dmatlack@...gle.com, dwmw@...zon.co.uk, erdemaktas@...gle.com,
fan.du@...el.com, fvdl@...gle.com, graf@...zon.com, haibo1.xu@...el.com,
hch@...radead.org, hughd@...gle.com, ira.weiny@...el.com,
isaku.yamahata@...el.com, jack@...e.cz, james.morse@....com,
jarkko@...nel.org, jgg@...pe.ca, jgowans@...zon.com, jhubbard@...dia.com,
jroedel@...e.de, jthoughton@...gle.com, jun.miao@...el.com,
kai.huang@...el.com, keirf@...gle.com, kent.overstreet@...ux.dev,
kirill.shutemov@...el.com, liam.merwick@...cle.com,
maciej.wieczor-retman@...el.com, mail@...iej.szmigiero.name, maz@...nel.org,
mic@...ikod.net, michael.roth@....com, mpe@...erman.id.au,
muchun.song@...ux.dev, nikunj@....com, nsaenz@...zon.es,
oliver.upton@...ux.dev, palmer@...belt.com, pankaj.gupta@....com,
paul.walmsley@...ive.com, pbonzini@...hat.com, pdurrant@...zon.co.uk,
peterx@...hat.com, pgonda@...gle.com, pvorel@...e.cz, qperret@...gle.com,
quic_cvanscha@...cinc.com, quic_eberman@...cinc.com,
quic_mnalajal@...cinc.com, quic_pderrin@...cinc.com, quic_pheragu@...cinc.com,
quic_svaddagi@...cinc.com, quic_tsoni@...cinc.com, richard.weiyang@...il.com,
rick.p.edgecombe@...el.com, rientjes@...gle.com, roypat@...zon.co.uk,
rppt@...nel.org, seanjc@...gle.com, shuah@...nel.org, steven.price@....com,
steven.sistare@...cle.com, suzuki.poulose@....com, thomas.lendacky@....com,
usama.arif@...edance.com, vbabka@...e.cz, viro@...iv.linux.org.uk,
vkuznets@...hat.com, wei.w.wang@...el.com, will@...nel.org,
willy@...radead.org, xiaoyao.li@...el.com, yan.y.zhao@...el.com,
yilun.xu@...el.com, yuzenghui@...wei.com, zhiquan1.li@...el.com
Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce
KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls
On Wed, May 21, 2025 at 5:36 AM Fuad Tabba <tabba@...gle.com> wrote:
> ....
> > When rebooting, the memslots may not yet be bound to the guest_memfd,
> > but we want to reset the guest_memfd's to private. If we use
> > KVM_SET_MEMORY_ATTRIBUTES to convert, we'd be forced to first bind, then
> > convert. If we had a direct ioctl, we don't have this restriction.
> >
> > If we do the conversion via vcpu_run() we would be forced to handle
> > conversions only with a vcpu_run() and only the guest can initiate a
> > conversion.
> >
> > On a guest boot for TDX, the memory is assumed to be private. If the we
> > gave it memory set as shared, we'd just have a bunch of
> > KVM_EXIT_MEMORY_FAULTs that slow down boot. Hence on a guest reboot, we
> > will want to reset the guest memory to private.
> >
> > We could say the firmware should reset memory to private on guest
> > reboot, but we can't force all guests to update firmware.
>
> Here is where I disagree. I do think that this is the CoCo guest's
> responsibility (and by guest I include its firmware) to fix its own
> state after a reboot. How would the host even know that a guest is
> rebooting if it's a CoCo guest?
There are a bunch of complexities here, reboot sequence on x86 can be
triggered using multiple ways that I don't fully understand, but few
of them include reading/writing to "reset register" in MMIO/PCI config
space that are emulated by the host userspace directly. Host has to
know when the guest is shutting down to manage it's lifecycle.
x86 CoCo VM firmwares don't support warm/soft reboot and even if it
does in future, guest kernel can choose a different reboot mechanism.
So guest reboot needs to be emulated by always starting from scratch.
This sequence needs initial guest firmware payload to be installed
into private ranges of guest_memfd.
>
> Either the host doesn't (or cannot even) know that the guest is
> rebooting, in which case I don't see how having an IOCTL would help.
Host does know that the guest is rebooting.
> Or somehow the host does know that, i.e., via a hypercall that
> indicates that. In which case, we could have it so that for that type
> of VM, we would reconvert its pages to private on a reboot.
This possibly could be solved by resetting the ranges to private when
binding with a memslot of certain VM type. But then Google also has a
usecase to support intrahost migration where a live VM and associated
guest_memfd files are bound to new KVM VM and memslots.
Otherwise, we need an additional contract between userspace/KVM to
intercept/handle guest_memfd range reset.
Powered by blists - more mailing lists