[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aTFirYPI5vlIhvCK@devgpu015.cco6.facebook.com>
Date: Thu, 4 Dec 2025 02:30:05 -0800
From: Alex Mastro <amastro@...com>
To: David Matlack <dmatlack@...gle.com>
CC: Pasha Tatashin <pasha.tatashin@...een.com>,
Alex Williamson
<alex@...zbot.org>,
Adithya Jayachandran <ajayachandra@...dia.com>,
Alistair
Popple <apopple@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Bjorn
Helgaas <bhelgaas@...gle.com>, Chris Li <chrisl@...nel.org>,
David Rientjes
<rientjes@...gle.com>,
Jacob Pan <jacob.pan@...ux.microsoft.com>,
Jason
Gunthorpe <jgg@...dia.com>, Jason Gunthorpe <jgg@...pe.ca>,
Josh Hilke
<jrhilke@...gle.com>, Kevin Tian <kevin.tian@...el.com>,
<kvm@...r.kernel.org>, Leon Romanovsky <leonro@...dia.com>,
<linux-kernel@...r.kernel.org>, <linux-kselftest@...r.kernel.org>,
<linux-pci@...r.kernel.org>, Lukas Wunner <lukas@...ner.de>,
Mike Rapoport
<rppt@...nel.org>, Parav Pandit <parav@...dia.com>,
Philipp Stanner
<pstanner@...hat.com>,
Pratyush Yadav <pratyush@...nel.org>,
Saeed Mahameed
<saeedm@...dia.com>,
Samiullah Khawaja <skhawaja@...gle.com>,
Shuah Khan
<shuah@...nel.org>, Tomita Moeko <tomitamoeko@...il.com>,
Vipin Sharma
<vipinsh@...gle.com>, William Tu <witu@...dia.com>,
Yi Liu
<yi.l.liu@...el.com>, Yunxiang Li <Yunxiang.Li@....com>,
Zhu Yanjun
<yanjun.zhu@...ux.dev>
Subject: Re: [PATCH 06/21] vfio/pci: Retrieve preserved device files after
Live Update
On Wed, Dec 03, 2025 at 09:29:27AM -0800, David Matlack wrote:
> On Wed, Dec 3, 2025 at 7:46 AM Pasha Tatashin <pasha.tatashin@...een.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 7:55 AM Alex Mastro <amastro@...com> wrote:
> > >
> > > On Wed, Nov 26, 2025 at 07:35:53PM +0000, David Matlack wrote:
> > > > From: Vipin Sharma <vipinsh@...gle.com>
> > > > static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
> > > > {
> > > > - return -EOPNOTSUPP;
> > > > + struct vfio_pci_core_device_ser *ser;
> > > > + struct vfio_device *device;
> > > > + struct folio *folio;
> > > > + struct file *file;
> > > > + int ret;
> > > > +
> > > > + folio = kho_restore_folio(args->serialized_data);
> > > > + if (!folio)
> > > > + return -ENOENT;
> > >
> > > Should this be consistent with the behavior of pci_flb_retrieve() which panics
> > > on failure? The short circuit failure paths which follow leak the folio,
>
> Thanks for catching the leaked folio. I'll fix that in the next version.
>
> > > which seems like a hygiene issue, but the practical significance is moot if
> > > vfio_pci_liveupdate_retrieve() failure is catastrophic anyways?
> >
> > pci_flb_retrieve() is used during boot. If it fails, we risk DMA
> > corrupting any memory region, so a panic makes sense. In contrast,
> > this retrieval happens once we are already in userspace, allowing the
> > user to decide how to handle the failure to recover the preserved
> > cdev.
>
> This is what I was thinking as well. vfio_pci_liveupdate_retrieve()
> runs in the context of the ioctl LIVEUPDATE_SESSION_RETRIEVE_FD, so we
> can just return an error up to userspace if anything goes wrong and
> let userspace initiate the reboot to recover the device if/when it's
> ready.
>
> OTOH, pci_flb_retrieve() gets called by the kernel during early boot
> to determine what devices the previous kernel preserved. If the kernel
> can't determine which devices were preserved by the previous kernel
> and once the kernel starts preserving I/O page tables, that could lead
> to corruption, so panicking is warranted.
Make sense, thanks for elaborating David and Pasha.
Powered by blists - more mailing lists