[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240212202714.GA1142983@bhelgaas>
Date: Mon, 12 Feb 2024 14:27:14 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: Leon Romanovsky <leonro@...dia.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>,
Jim Harris <jim.harris@...sung.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org, Jason Gunthorpe <jgg@...dia.com>,
Alex Williamson <alex.williamson@...hat.com>,
Pierre Crégut <pierre.cregut@...nge.com>
Subject: Re: [PATCH v2 1/2] PCI/IOV: Revert "PCI/IOV: Serialize sysfs
sriov_numvfs reads vs writes"
On Sun, Feb 11, 2024 at 10:48:44AM +0200, Leon Romanovsky wrote:
> On Fri, Feb 09, 2024 at 07:20:28PM -0800, Kuppuswamy Sathyanarayanan wrote:
> > On 2/9/24 3:52 PM, Jim Harris wrote:
> > > If an SR-IOV enabled device is held by vfio, and the device is removed,
> > > vfio will hold device lock and notify userspace of the removal. If
> > > userspace reads the sriov_numvfs sysfs entry, that thread will be blocked
> > > since sriov_numvfs_show() also tries to acquire the device lock. If that
> > > same thread is responsible for releasing the device to vfio, it results in
> > > a deadlock.
> > >
> > > The proper way to detect a change to the num_VFs value is to listen for a
> > > sysfs event, not to add a device_lock() on the attribute _show() in the
> > > kernel.
The lock was not about detecting a change; Pierre did this:
ip monitor dev ${DEVICE} | grep --line-buffered "^${id}:" | while read line; do \
cat ${path}/device/sriov_numvfs; \
which I assume works by listening for sysfs events. The problem was
that after the event occurred, the sriov_numvfs read got a stale value
(see https://bugzilla.kernel.org/show_bug.cgi?id=202991).
So I would drop this sentence because I don't think it accurately
reflects the reason for 35ff867b7657.
> > Since you are reverting a commit that synchronizes SysFS read
> > /write, please add some comments about why it is not an
> > issue anymore.
>
> It was never an issue, the idea that sysfs read and write should be
> serialized by kernel is not correct by definition.
I think it *was* an issue. The behavior Pierre observed at was
clearly wrong, and we added 35ff867b7657 ("PCI/IOV: Serialize sysfs
sriov_numvfs reads vs writes") to resolve it.
We should try to avoid reintroducing the problem, so I think we should
probably squash these two patches and describe it as a deadlock fix
instead of dismissing 35ff867b7657 as being based on false premises.
It would be awesome if you had time to verify that these patches also
resolve the problem you saw, Pierre.
I think we should also add:
Fixes: 35ff867b7657 ("PCI/IOV: Serialize sysfs sriov_numvfs reads vs writes")
as a trigger for backporting this to kernels that include
35ff867b7657.
Bjorn
> > > This reverts commit 35ff867b76576e32f34c698ccd11343f7d616204.
> > > Revert had a small conflict, the sprintf() is now changed to sysfs_emit().
> > >
> > > Link: https://lore.kernel.org/linux-pci/ZXJI5+f8bUelVXqu@ubuntu/
> > > Suggested-by: Leon Romanovsky <leonro@...dia.com>
> > > Reviewed-by: Leon Romanovsky <leonro@...dia.com>
> > > Signed-off-by: Jim Harris <jim.harris@...sung.com>
> > > ---
> > > drivers/pci/iov.c | 8 +-------
> > > 1 file changed, 1 insertion(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > > index aaa33e8dc4c9..0ca20cd518d5 100644
> > > --- a/drivers/pci/iov.c
> > > +++ b/drivers/pci/iov.c
> > > @@ -395,14 +395,8 @@ static ssize_t sriov_numvfs_show(struct device *dev,
> > > char *buf)
> > > {
> > > struct pci_dev *pdev = to_pci_dev(dev);
> > > - u16 num_vfs;
> > > -
> > > - /* Serialize vs sriov_numvfs_store() so readers see valid num_VFs */
> > > - device_lock(&pdev->dev);
> > > - num_vfs = pdev->sriov->num_VFs;
> > > - device_unlock(&pdev->dev);
> > >
> > > - return sysfs_emit(buf, "%u\n", num_vfs);
> > > + return sysfs_emit(buf, "%u\n", pdev->sriov->num_VFs);
> > > }
> > >
> > > /*
> > >
> > --
> > Sathyanarayanan Kuppuswamy
> > Linux Kernel Developer
> >
Powered by blists - more mailing lists