[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d32d9c974b6f4922b1fcf9299b367af1@huawei.com>
Date: Sun, 21 Aug 2022 11:25:34 +0000
From: zhuangshengen <zhuangshengen@...wei.com>
To: Bjorn Helgaas <helgaas@...nel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "Zhoujian (jay)" <jianjay.zhou@...wei.com>,
"Gonglei (Arei)" <arei.gonglei@...wei.com>,
"Wangjing(Hogan)" <hogan.wang@...wei.com>
Subject: 答复: [Question] Any plan to support enable PCI SRIOV concurrently in kernel side?
> On Wed, Aug 17, 2022 at 07:43:34AM +0000, Zhoujian (jay) wrote:
> > Hi,
> >
> > Enable SRIOV concurrently with many different PFs in userspace seems workable.
> > I'm trying to do it with 8 PFs(each one with 240+ VFs), but get some
> > warnings, here is the backtrace:
>
> This definitely seems like a problem that should be fixed. If you have a script that can reproduce it, that might help people work on it. If you can reproduce it in qemu, that would be even better.
>
I am enclosing a demo that will echo sriov_totalvfs > /sys/bus/pci/devices/$PF_BDF/sriov_numvfs concurrently, which can help reproduce the problem, please find attached.
> Some comments on the patch below.
>
> > Warning 1:
> > ---
> > sysfs: cannot create duplicate filename '/devices/pci0000:30/0000:30:02.0/pci_bus/0000:32'
> > Call Trace:
> > dump_stack+0x6f/0xab
> > sysfs_warn_dup+0x56/0x70
> > sysfs_create_dir_ns+0x80/0x90
> > kobject_add_internal+0xa0/0x2b0
> > kobject_add+0x71/0xd0
> > device_add+0x126/0x630
> > pci_add_new_bus+0x17c/0x4b0
> > pci_iov_add_virtfn+0x336/0x390
> > sriov_enable+0x26e/0x450
> > virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> > ---
> > The reason is that different VFs may create the same pci bus number
> > and try to add new bus concurrently in virtfn_add_bus.
> >
> > Warning 2:
> > ---
> > proc_dir_entry 'pci/33' already registered
> > WARNING: CPU: 71 PID: 893 at fs/proc/generic.c:360
> > proc_register+0xf8/0x130 Call Trace:
> > proc_mkdir_data+0x5d/0x80
> > pci_proc_attach_device+0xe9/0x120
> > pci_bus_add_device+0x33/0x90
> > pci_iov_add_virtfn+0x375/0x390
> > sriov_enable+0x26e/0x450
> > virtio_pci_sriov_configure+0x61/0xc0 [virtio_pci]
> > ---
> > The reason is that different VFs may create '/proc/bus/pci/bus_number'
> > directory using the same bus number in pci_proc_attach_device concurrently.
> >
> > Mutex lock can avoid potential conflict. With the patch below the
> > warnings above are no longer appear.
> >
> > My question is that any plan to support enable PCI SRIOV concurrently in kernel side?
> >
> > Thanks
> >
> > ---
> > drivers/pci/iov.c | 29 +++++++++++++++++++++++++++--
> > 1 file changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index
> > 952217572113..6a8a849298c4 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -16,6 +16,12 @@
> >
> > #define VIRTFN_ID_LEN 16
> >
> > +static struct mutex add_bus_mutex;
> > +static int add_bus_mutex_initialized;
> > +
> > +static struct mutex add_device_mutex; static int
> > +add_device_mutex_initialized;
> > +
> > int pci_iov_virtfn_bus(struct pci_dev *dev, int vf_id) {
> > if (!dev->is_physfn)
> > @@ -127,13 +133,24 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
> > if (bus->number == busnr)
> > return bus;
> >
> > + if (!add_bus_mutex_initialized) {
> > + mutex_init(&add_bus_mutex);
> > + add_bus_mutex_initialized = 1;
> > + }
>
> I assume this patch works around the warning. I see the intent here, but I think would need some rework before merging it. These locks protect pci_add_new_bus() and pci_bus_add_device(), but only for the callers in iov.c. These interfaces are both called from places other than iov.c, and any mutual exclusion should cover all of them.
>
> I'm actually not sure how the other callers are protected. I assume we're holding a device_lock for some device farther up the chain. Or, I see that acpi_pci_root_add() and rescan_store() hold pci_rescan_remove_lock while calling these. We don't seem to hold that uniformly though, which bothers me, because I think there are many other paths, e.g., pci_host_probe() and its callers.
>
> > + mutex_lock(&add_bus_mutex);
> > +
> > child = pci_find_bus(pci_domain_nr(bus), busnr);
> > - if (child)
> > + if (child) {
> > + mutex_unlock(&add_bus_mutex);
> > return child;
> > + }
> >
> > child = pci_add_new_bus(bus, NULL, busnr);
> > - if (!child)
> > + if (!child) {
> > + mutex_unlock(&add_bus_mutex);
> > return NULL;
> > + }
> > + mutex_unlock(&add_bus_mutex);
> >
> > pci_bus_insert_busn_res(child, busnr, busnr);
> >
> > @@ -339,8 +356,16 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
> > if (rc)
> > goto failed1;
> >
> > + if (!add_device_mutex_initialized) {
> > + mutex_init(&add_device_mutex);
> > + add_device_mutex_initialized = 1;
> > + }
> > + mutex_lock(&add_device_mutex);
> > +
> > pci_bus_add_device(virtfn);
> >
> > + mutex_unlock(&add_device_mutex);
> > +
> > return 0;
> >
> > failed1:
> > ---
I write a new patch can fix the problem above. I add a new function pci_add_new_bus_locked which will do find bus and add new bus with mutex locked. at most case in virtfn_add_bus, vf will find exiting bus and return, this is a fast path and no need mutex protect; if bus is not exiting, and different vf in a race will add the same bus, they should call pci_add_new_bus_locked, this is the slower patch but safe;
I alse add a device_lock in pci_proc_attach_device when create bus->procdir, this will fix the conflict when different VFs may create '/proc/bus/pci/bus_number' directory using the same bus number concurrently
---
drivers/pci/iov.c | 7 ++++++-
drivers/pci/probe.c | 26 ++++++++++++++++++++++++++
drivers/pci/proc.c | 6 +++++-
include/linux/pci.h | 2 ++
4 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 952217572113..cde0155749c5 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -127,11 +127,16 @@ static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
if (bus->number == busnr)
return bus;
+ /*
+ * here vf will find existing bus at most case; if not existing, it should
+ * go through slow path to create new bus with locked to support enable SRIOV
+ * concurrently with many different PFs in userspace.
+ */
child = pci_find_bus(pci_domain_nr(bus), busnr);
if (child)
return child;
- child = pci_add_new_bus(bus, NULL, busnr);
+ child = pci_add_new_bus_locked(bus, NULL, busnr);
if (!child)
return NULL;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c5286b027f00..5dc2a6774fa9 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -25,6 +25,8 @@
#define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */
#define CARDBUS_RESERVE_BUSNR 3
+DEFINE_MUTEX(add_bus_mutex);
+
static struct resource busn_resource = {
.name = "PCI busn",
.start = 0,
@@ -1170,6 +1172,30 @@ struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
}
EXPORT_SYMBOL(pci_add_new_bus);
+struct pci_bus *pci_add_new_bus_locked(struct pci_bus *parent, struct pci_dev *dev,
+ int busnr)
+{
+ struct pci_bus *child;
+
+ mutex_lock(&add_bus_mutex);
+ child = pci_find_bus(pci_domain_nr(parent), busnr);
+ if (child) {
+ mutex_unlock(&add_bus_mutex);
+ return child;
+ }
+
+ child = pci_alloc_child_bus(parent, dev, busnr);
+ if (child) {
+ down_write(&pci_bus_sem);
+ list_add_tail(&child->node, &parent->children);
+ up_write(&pci_bus_sem);
+ }
+ mutex_unlock(&add_bus_mutex);
+
+ return child;
+}
+EXPORT_SYMBOL(pci_add_new_bus_locked);
+
static void pci_enable_crs(struct pci_dev *pdev)
{
u16 root_cap = 0;
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index f967709082d6..f927263c2fe0 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -421,6 +421,7 @@ int pci_proc_attach_device(struct pci_dev *dev)
if (!proc_initialized)
return -EACCES;
+ device_lock(&bus->dev);
if (!bus->procdir) {
if (pci_proc_domain(bus)) {
sprintf(name, "%04x:%02x", pci_domain_nr(bus),
@@ -429,9 +430,12 @@ int pci_proc_attach_device(struct pci_dev *dev)
sprintf(name, "%02x", bus->number);
}
bus->procdir = proc_mkdir(name, proc_bus_pci_dir);
- if (!bus->procdir)
+ if (!bus->procdir) {
+ device_unlock(&bus->dev);
return -ENOMEM;
+ }
}
+ device_unlock(&bus->dev);
sprintf(name, "%02x.%x", PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn));
e = proc_create_data(name, S_IFREG | S_IRUGO | S_IWUSR, bus->procdir,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 060af91bafcd..ec5b68b4c63d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1112,6 +1112,8 @@ struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
int pci_scan_root_bus_bridge(struct pci_host_bridge *bridge);
struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
int busnr);
+struct pci_bus *pci_add_new_bus_locked(struct pci_bus *parent, struct pci_dev *dev,
++ int busnr);
struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
const char *name,
struct hotplug_slot *hotplug);
--
View attachment "enable_sriov_demo.c" of type "text/plain" (4386 bytes)
Powered by blists - more mailing lists