lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALzav=es=RKMsRUdpX03m+2Eq4SVxPZSZuy1fLXV+dv=rhDhaw@mail.gmail.com>
Date: Tue, 2 Dec 2025 13:29:34 -0800
From: David Matlack <dmatlack@...gle.com>
To: Pratyush Yadav <pratyush@...nel.org>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>, Alex Williamson <alex@...zbot.org>, 
	Adithya Jayachandran <ajayachandra@...dia.com>, Alex Mastro <amastro@...com>, 
	Alistair Popple <apopple@...dia.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Bjorn Helgaas <bhelgaas@...gle.com>, Chris Li <chrisl@...nel.org>, 
	David Rientjes <rientjes@...gle.com>, Jacob Pan <jacob.pan@...ux.microsoft.com>, 
	Jason Gunthorpe <jgg@...dia.com>, Jason Gunthorpe <jgg@...pe.ca>, Josh Hilke <jrhilke@...gle.com>, 
	Kevin Tian <kevin.tian@...el.com>, kvm@...r.kernel.org, 
	Leon Romanovsky <leonro@...dia.com>, linux-kernel@...r.kernel.org, 
	linux-kselftest@...r.kernel.org, linux-pci@...r.kernel.org, 
	Lukas Wunner <lukas@...ner.de>, Mike Rapoport <rppt@...nel.org>, Parav Pandit <parav@...dia.com>, 
	Philipp Stanner <pstanner@...hat.com>, Saeed Mahameed <saeedm@...dia.com>, 
	Samiullah Khawaja <skhawaja@...gle.com>, Shuah Khan <shuah@...nel.org>, 
	Tomita Moeko <tomitamoeko@...il.com>, Vipin Sharma <vipinsh@...gle.com>, William Tu <witu@...dia.com>, 
	Yi Liu <yi.l.liu@...el.com>, Yunxiang Li <Yunxiang.Li@....com>, 
	Zhu Yanjun <yanjun.zhu@...ux.dev>
Subject: Re: [PATCH 00/21] vfio/pci: Base support to preserve a VFIO device
 file across Live Update

On Tue, Dec 2, 2025 at 6:10 AM Pratyush Yadav <pratyush@...nel.org> wrote:
>
> On Mon, Dec 01 2025, Pasha Tatashin wrote:
>
> > On Wed, Nov 26, 2025 at 2:36 PM David Matlack <dmatlack@...gle.com> wrote:
> [...]
> >> FLB Locking
> >>
> >>   I don't see a way to properly synchronize pci_flb_finish() with
> >>   pci_liveupdate_incoming_is_preserved() since the incoming FLB mutex is
> >>   dropped by liveupdate_flb_get_incoming() when it returns the pointer
> >>   to the object, and taking pci_flb_incoming_lock in pci_flb_finish()
> >>   could result in a deadlock due to reversing the lock ordering.
>
> My mental model for FLB is that it is a dependency for files, so it
> should always be created (aka prepare) before _any_ of the files, and
> always destroyed (aka finish) after _all_ of the files.
>
> By the time the FLB is being finished, all the files for that FLB should
> also be finished, so there should no longer be a user of the FLB.
>
> Once all of the files are finished, it should be LUO's responsibility to
> make sure liveupdate_flb_get_incoming() returns an error _before_ it
> starts doing the FLB finish. And in FLB finish you should not be needing
> to take any locks.
>
> Why do you want to use the FLB when it is being finished?

The next patch looks at the PCI FLB anytime a device is probed, which
could could race with the last device file getting finished causing
the FLB to be freed.

However, it looks like I am going to drop that patch. But the PCI FLB
is still used in PATCH 08 [1] whenever userspace opens a VFIO cdev or
issues the VFIO_GROUP_GET_DEVICE_FD ioctl to check of the underlying
PCI device was preserved. Offline Jason suggested decoupling those
checks from the FLB, so I'll look into doing that in the next version.

[1]https://lore.kernel.org/kvm/20251126193608.2678510-9-dmatlack@google.com/

>
> >
> > I will re-introduce _lock/_unlock API to solve this issue.
> >
> >>
> >> FLB Retrieving
> >>
> >>   The first patch of this series includes a fix to prevent an FLB from
> >>   being retrieved again it is finished. I am wondering if this is the
> >>   right approach or if subsystems are expected to stop calling
> >>   liveupdate_flb_get_incoming() after an FLB is finished.
>
> IMO once the FLB is finished, LUO should make sure it cannot be
> retrieved, mainly so subsystem code is simpler and less bug-prone.

+1, and I think Pasha is going to do that in the next version of FLB.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ