[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALzav=es=RKMsRUdpX03m+2Eq4SVxPZSZuy1fLXV+dv=rhDhaw@mail.gmail.com>
Date: Tue, 2 Dec 2025 13:29:34 -0800
From: David Matlack <dmatlack@...gle.com>
To: Pratyush Yadav <pratyush@...nel.org>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>, Alex Williamson <alex@...zbot.org>,
Adithya Jayachandran <ajayachandra@...dia.com>, Alex Mastro <amastro@...com>,
Alistair Popple <apopple@...dia.com>, Andrew Morton <akpm@...ux-foundation.org>,
Bjorn Helgaas <bhelgaas@...gle.com>, Chris Li <chrisl@...nel.org>,
David Rientjes <rientjes@...gle.com>, Jacob Pan <jacob.pan@...ux.microsoft.com>,
Jason Gunthorpe <jgg@...dia.com>, Jason Gunthorpe <jgg@...pe.ca>, Josh Hilke <jrhilke@...gle.com>,
Kevin Tian <kevin.tian@...el.com>, kvm@...r.kernel.org,
Leon Romanovsky <leonro@...dia.com>, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-pci@...r.kernel.org,
Lukas Wunner <lukas@...ner.de>, Mike Rapoport <rppt@...nel.org>, Parav Pandit <parav@...dia.com>,
Philipp Stanner <pstanner@...hat.com>, Saeed Mahameed <saeedm@...dia.com>,
Samiullah Khawaja <skhawaja@...gle.com>, Shuah Khan <shuah@...nel.org>,
Tomita Moeko <tomitamoeko@...il.com>, Vipin Sharma <vipinsh@...gle.com>, William Tu <witu@...dia.com>,
Yi Liu <yi.l.liu@...el.com>, Yunxiang Li <Yunxiang.Li@....com>,
Zhu Yanjun <yanjun.zhu@...ux.dev>
Subject: Re: [PATCH 00/21] vfio/pci: Base support to preserve a VFIO device
file across Live Update
On Tue, Dec 2, 2025 at 6:10 AM Pratyush Yadav <pratyush@...nel.org> wrote:
>
> On Mon, Dec 01 2025, Pasha Tatashin wrote:
>
> > On Wed, Nov 26, 2025 at 2:36 PM David Matlack <dmatlack@...gle.com> wrote:
> [...]
> >> FLB Locking
> >>
> >> I don't see a way to properly synchronize pci_flb_finish() with
> >> pci_liveupdate_incoming_is_preserved() since the incoming FLB mutex is
> >> dropped by liveupdate_flb_get_incoming() when it returns the pointer
> >> to the object, and taking pci_flb_incoming_lock in pci_flb_finish()
> >> could result in a deadlock due to reversing the lock ordering.
>
> My mental model for FLB is that it is a dependency for files, so it
> should always be created (aka prepare) before _any_ of the files, and
> always destroyed (aka finish) after _all_ of the files.
>
> By the time the FLB is being finished, all the files for that FLB should
> also be finished, so there should no longer be a user of the FLB.
>
> Once all of the files are finished, it should be LUO's responsibility to
> make sure liveupdate_flb_get_incoming() returns an error _before_ it
> starts doing the FLB finish. And in FLB finish you should not be needing
> to take any locks.
>
> Why do you want to use the FLB when it is being finished?
The next patch looks at the PCI FLB anytime a device is probed, which
could could race with the last device file getting finished causing
the FLB to be freed.
However, it looks like I am going to drop that patch. But the PCI FLB
is still used in PATCH 08 [1] whenever userspace opens a VFIO cdev or
issues the VFIO_GROUP_GET_DEVICE_FD ioctl to check of the underlying
PCI device was preserved. Offline Jason suggested decoupling those
checks from the FLB, so I'll look into doing that in the next version.
[1]https://lore.kernel.org/kvm/20251126193608.2678510-9-dmatlack@google.com/
>
> >
> > I will re-introduce _lock/_unlock API to solve this issue.
> >
> >>
> >> FLB Retrieving
> >>
> >> The first patch of this series includes a fix to prevent an FLB from
> >> being retrieved again it is finished. I am wondering if this is the
> >> right approach or if subsystems are expected to stop calling
> >> liveupdate_flb_get_incoming() after an FLB is finished.
>
> IMO once the FLB is finished, LUO should make sure it cannot be
> retrieved, mainly so subsystem code is simpler and less bug-prone.
+1, and I think Pasha is going to do that in the next version of FLB.
Powered by blists - more mailing lists