[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALzav=ciz4kV+u3B5bMzZzVY+cMs-G=q9c5O-jKPz+E4LUdx7g@mail.gmail.com>
Date: Wed, 3 Dec 2025 09:29:27 -0800
From: David Matlack <dmatlack@...gle.com>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: Alex Mastro <amastro@...com>, Alex Williamson <alex@...zbot.org>,
Adithya Jayachandran <ajayachandra@...dia.com>, Alistair Popple <apopple@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Chris Li <chrisl@...nel.org>, David Rientjes <rientjes@...gle.com>,
Jacob Pan <jacob.pan@...ux.microsoft.com>, Jason Gunthorpe <jgg@...dia.com>,
Jason Gunthorpe <jgg@...pe.ca>, Josh Hilke <jrhilke@...gle.com>, Kevin Tian <kevin.tian@...el.com>,
kvm@...r.kernel.org, Leon Romanovsky <leonro@...dia.com>, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-pci@...r.kernel.org,
Lukas Wunner <lukas@...ner.de>, Mike Rapoport <rppt@...nel.org>, Parav Pandit <parav@...dia.com>,
Philipp Stanner <pstanner@...hat.com>, Pratyush Yadav <pratyush@...nel.org>,
Saeed Mahameed <saeedm@...dia.com>, Samiullah Khawaja <skhawaja@...gle.com>, Shuah Khan <shuah@...nel.org>,
Tomita Moeko <tomitamoeko@...il.com>, Vipin Sharma <vipinsh@...gle.com>, William Tu <witu@...dia.com>,
Yi Liu <yi.l.liu@...el.com>, Yunxiang Li <Yunxiang.Li@....com>,
Zhu Yanjun <yanjun.zhu@...ux.dev>
Subject: Re: [PATCH 06/21] vfio/pci: Retrieve preserved device files after
Live Update
On Wed, Dec 3, 2025 at 7:46 AM Pasha Tatashin <pasha.tatashin@...een.com> wrote:
>
> On Wed, Dec 3, 2025 at 7:55 AM Alex Mastro <amastro@...com> wrote:
> >
> > On Wed, Nov 26, 2025 at 07:35:53PM +0000, David Matlack wrote:
> > > From: Vipin Sharma <vipinsh@...gle.com>
> > > static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
> > > {
> > > - return -EOPNOTSUPP;
> > > + struct vfio_pci_core_device_ser *ser;
> > > + struct vfio_device *device;
> > > + struct folio *folio;
> > > + struct file *file;
> > > + int ret;
> > > +
> > > + folio = kho_restore_folio(args->serialized_data);
> > > + if (!folio)
> > > + return -ENOENT;
> >
> > Should this be consistent with the behavior of pci_flb_retrieve() which panics
> > on failure? The short circuit failure paths which follow leak the folio,
Thanks for catching the leaked folio. I'll fix that in the next version.
> > which seems like a hygiene issue, but the practical significance is moot if
> > vfio_pci_liveupdate_retrieve() failure is catastrophic anyways?
>
> pci_flb_retrieve() is used during boot. If it fails, we risk DMA
> corrupting any memory region, so a panic makes sense. In contrast,
> this retrieval happens once we are already in userspace, allowing the
> user to decide how to handle the failure to recover the preserved
> cdev.
This is what I was thinking as well. vfio_pci_liveupdate_retrieve()
runs in the context of the ioctl LIVEUPDATE_SESSION_RETRIEVE_FD, so we
can just return an error up to userspace if anything goes wrong and
let userspace initiate the reboot to recover the device if/when it's
ready.
OTOH, pci_flb_retrieve() gets called by the kernel during early boot
to determine what devices the previous kernel preserved. If the kernel
can't determine which devices were preserved by the previous kernel
and once the kernel starts preserving I/O page tables, that could lead
to corruption, so panicking is warranted.
Powered by blists - more mailing lists