linux-kernel - Re: [PATCH 06/21] vfio/pci: Retrieve preserved device files after Live Update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aTFirYPI5vlIhvCK@devgpu015.cco6.facebook.com>
Date: Thu, 4 Dec 2025 02:30:05 -0800
From: Alex Mastro <amastro@...com>
To: David Matlack <dmatlack@...gle.com>
CC: Pasha Tatashin <pasha.tatashin@...een.com>,
        Alex Williamson
	<alex@...zbot.org>,
        Adithya Jayachandran <ajayachandra@...dia.com>,
        Alistair
 Popple <apopple@...dia.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Bjorn
 Helgaas <bhelgaas@...gle.com>, Chris Li <chrisl@...nel.org>,
        David Rientjes
	<rientjes@...gle.com>,
        Jacob Pan <jacob.pan@...ux.microsoft.com>,
        Jason
 Gunthorpe <jgg@...dia.com>, Jason Gunthorpe <jgg@...pe.ca>,
        Josh Hilke
	<jrhilke@...gle.com>, Kevin Tian <kevin.tian@...el.com>,
        <kvm@...r.kernel.org>, Leon Romanovsky <leonro@...dia.com>,
        <linux-kernel@...r.kernel.org>, <linux-kselftest@...r.kernel.org>,
        <linux-pci@...r.kernel.org>, Lukas Wunner <lukas@...ner.de>,
        Mike Rapoport
	<rppt@...nel.org>, Parav Pandit <parav@...dia.com>,
        Philipp Stanner
	<pstanner@...hat.com>,
        Pratyush Yadav <pratyush@...nel.org>,
        Saeed Mahameed
	<saeedm@...dia.com>,
        Samiullah Khawaja <skhawaja@...gle.com>,
        Shuah Khan
	<shuah@...nel.org>, Tomita Moeko <tomitamoeko@...il.com>,
        Vipin Sharma
	<vipinsh@...gle.com>, William Tu <witu@...dia.com>,
        Yi Liu
	<yi.l.liu@...el.com>, Yunxiang Li <Yunxiang.Li@....com>,
        Zhu Yanjun
	<yanjun.zhu@...ux.dev>
Subject: Re: [PATCH 06/21] vfio/pci: Retrieve preserved device files after
 Live Update

On Wed, Dec 03, 2025 at 09:29:27AM -0800, David Matlack wrote:
> On Wed, Dec 3, 2025 at 7:46 AM Pasha Tatashin <pasha.tatashin@...een.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 7:55 AM Alex Mastro <amastro@...com> wrote:
> > >
> > > On Wed, Nov 26, 2025 at 07:35:53PM +0000, David Matlack wrote:
> > > > From: Vipin Sharma <vipinsh@...gle.com>
> > > >  static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
> > > >  {
> > > > -     return -EOPNOTSUPP;
> > > > +     struct vfio_pci_core_device_ser *ser;
> > > > +     struct vfio_device *device;
> > > > +     struct folio *folio;
> > > > +     struct file *file;
> > > > +     int ret;
> > > > +
> > > > +     folio = kho_restore_folio(args->serialized_data);
> > > > +     if (!folio)
> > > > +             return -ENOENT;
> > >
> > > Should this be consistent with the behavior of pci_flb_retrieve() which panics
> > > on failure? The short circuit failure paths which follow leak the folio,
> 
> Thanks for catching the leaked folio. I'll fix that in the next version.
> 
> > > which seems like a hygiene issue, but the practical significance is moot if
> > > vfio_pci_liveupdate_retrieve() failure is catastrophic anyways?
> >
> > pci_flb_retrieve() is used during boot. If it fails, we risk DMA
> > corrupting any memory region, so a panic makes sense. In contrast,
> > this retrieval happens once we are already in userspace, allowing the
> > user to decide how to handle the failure to recover the preserved
> > cdev.
> 
> This is what I was thinking as well. vfio_pci_liveupdate_retrieve()
> runs in the context of the ioctl LIVEUPDATE_SESSION_RETRIEVE_FD, so we
> can just return an error up to userspace if anything goes wrong and
> let userspace initiate the reboot to recover the device if/when it's
> ready.
> 
> OTOH, pci_flb_retrieve() gets called by the kernel during early boot
> to determine what devices the previous kernel preserved. If the kernel
> can't determine which devices were preserved by the previous kernel
> and once the kernel starts preserving I/O page tables, that could lead
> to corruption, so panicking is warranted.

Make sense, thanks for elaborating David and Pasha.