lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <PUZP153MB0749F39A34DEC9FABE17C615BE889@PUZP153MB0749.APCP153.PROD.OUTLOOK.COM> Date: Tue, 28 Mar 2023 05:29:07 +0000 From: Saurabh Singh Sengar <ssengar@...rosoft.com> To: Dexuan Cui <decui@...rosoft.com>, "bhelgaas@...gle.com" <bhelgaas@...gle.com>, "davem@...emloft.net" <davem@...emloft.net>, Dexuan Cui <decui@...rosoft.com>, "edumazet@...gle.com" <edumazet@...gle.com>, Haiyang Zhang <haiyangz@...rosoft.com>, Jake Oshins <jakeo@...rosoft.com>, "kuba@...nel.org" <kuba@...nel.org>, "kw@...ux.com" <kw@...ux.com>, KY Srinivasan <kys@...rosoft.com>, "leon@...nel.org" <leon@...nel.org>, "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>, "lpieralisi@...nel.org" <lpieralisi@...nel.org>, "Michael Kelley (LINUX)" <mikelley@...rosoft.com>, "pabeni@...hat.com" <pabeni@...hat.com>, "robh@...nel.org" <robh@...nel.org>, "saeedm@...dia.com" <saeedm@...dia.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>, Long Li <longli@...rosoft.com>, "boqun.feng@...il.com" <boqun.feng@...il.com> CC: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>, "netdev@...r.kernel.org" <netdev@...r.kernel.org> Subject: RE: [EXTERNAL] [PATCH 1/6] PCI: hv: fix a race condition bug in hv_pci_query_relations() > -----Original Message----- > From: Dexuan Cui <decui@...rosoft.com> > Sent: Tuesday, March 28, 2023 10:21 AM > To: bhelgaas@...gle.com; davem@...emloft.net; Dexuan Cui > <decui@...rosoft.com>; edumazet@...gle.com; Haiyang Zhang > <haiyangz@...rosoft.com>; Jake Oshins <jakeo@...rosoft.com>; > kuba@...nel.org; kw@...ux.com; KY Srinivasan <kys@...rosoft.com>; > leon@...nel.org; linux-pci@...r.kernel.org; lpieralisi@...nel.org; Michael > Kelley (LINUX) <mikelley@...rosoft.com>; pabeni@...hat.com; > robh@...nel.org; saeedm@...dia.com; wei.liu@...nel.org; Long Li > <longli@...rosoft.com>; boqun.feng@...il.com > Cc: linux-hyperv@...r.kernel.org; linux-kernel@...r.kernel.org; linux- > rdma@...r.kernel.org; netdev@...r.kernel.org > Subject: [EXTERNAL] [PATCH 1/6] PCI: hv: fix a race condition bug in > hv_pci_query_relations() > > Fix the longstanding race between hv_pci_query_relations() and > survey_child_resources() by flushing the workqueue before we exit from > hv_pci_query_relations(). > > Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft > Hyper-V VMs") > Signed-off-by: Dexuan Cui <decui@...rosoft.com> > > --- > drivers/pci/controller/pci-hyperv.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > With the below debug code: > > @@ -2103,6 +2103,8 @@ static void survey_child_resources(struct > hv_pcibus_device *hbus) > } > > spin_unlock_irqrestore(&hbus->device_list_lock, flags); > + ssleep(15); > + printk("%s: completing %px\n", __func__, event); > complete(event); > } > > @@ -3305,8 +3307,12 @@ static int hv_pci_query_relations(struct hv_device > *hdev) > > ret = vmbus_sendpacket(hdev->channel, &message, sizeof(message), > 0, VM_PKT_DATA_INBAND, 0); > - if (!ret) > + if (!ret) { > + ssleep(10); // unassign the PCI device on the host during the > 10s > ret = wait_for_response(hdev, &comp); > + printk("%s: comp=%px is becoming invalid! ret=%d\n", > + __func__, &comp, ret); > + } > > return ret; > } > @@ -3635,6 +3641,8 @@ static int hv_pci_probe(struct hv_device *hdev, > > retry: > ret = hv_pci_query_relations(hdev); > + printk("hv_pci_query_relations() exited\n"); Can we use pr_* or the appropriate KERN_<LEVEL> in all the printk(s). > + > if (ret) > goto free_irq_domain; > > I'm able to repro the below hang issue: > > [ 74.544744] hv_pci b92a0085-468b-407a-a88a-d33fac8edc75: PCI VMBus > probing: Using version 0x10004 > [ 76.886944] hv_netvsc 818fe754-b912-4445-af51-1f584812e3c9 eth0: VF slot > 1 removed > [ 84.788266] hv_pci b92a0085-468b-407a-a88a-d33fac8edc75: The device is > gone. > [ 84.792586] hv_pci_query_relations: comp=ffffa7504012fb58 is becoming > invalid! ret=-19 > [ 84.797505] hv_pci_query_relations() exited > [ 89.652268] survey_child_resources: completing ffffa7504012fb58 > [ 150.392242] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 150.398447] rcu: 15-...0: (2 ticks this GP) > idle=867c/1/0x4000000000000000 softirq=947/947 fqs=5234 > [ 150.405851] rcu: (detected by 14, t=15004 jiffies, g=2553, q=4833 > ncpus=16) > [ 150.410870] Sending NMI from CPU 14 to CPUs 15: > [ 150.414836] NMI backtrace for cpu 15 > [ 150.414840] CPU: 15 PID: 10 Comm: kworker/u32:0 Tainted: G W E > 6.3.0-rc3-decui-dirty #34 > ... > [ 150.414849] Workqueue: hv_pci_468b pci_devices_present_work > [pci_hyperv] [ 150.414866] RIP: > 0010:__pv_queued_spin_lock_slowpath+0x10f/0x3c0 > ... > [ 150.414905] Call Trace: > [ 150.414907] <TASK> > [ 150.414911] _raw_spin_lock_irqsave+0x40/0x50 [ 150.414917] > complete+0x1d/0x60 [ 150.414924] pci_devices_present_work+0x5dd/0x680 > [pci_hyperv] [ 150.414946] process_one_work+0x21f/0x430 [ 150.414952] > worker_thread+0x4a/0x3c0 > > With this patch, the hang issue goes away: > > [ 186.143612] hv_pci b92a0085-468b-407a-a88a-d33fac8edc75: The device is > gone. > [ 186.148034] hv_pci_query_relations: comp=ffffa7cfd0aa3b50 is becoming > invalid! ret=-19 [ 191.263611] survey_child_resources: completing > ffffa7cfd0aa3b50 [ 191.267732] hv_pci_query_relations() exited > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci- > hyperv.c > index f33370b75628..b82c7cde19e6 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -3308,6 +3308,19 @@ static int hv_pci_query_relations(struct hv_device > *hdev) > if (!ret) > ret = wait_for_response(hdev, &comp); > > + /* > + * In the case of fast device addition/removal, it's possible that > + * vmbus_sendpacket() or wait_for_response() returns -ENODEV but > we > + * already got a PCI_BUS_RELATIONS* message from the host and the > + * channel callback already scheduled a work to hbus->wq, which can > be > + * running survey_child_resources() -> complete(&hbus- > >survey_event), > + * even after hv_pci_query_relations() exits and the stack variable > + * 'comp' is no longer valid. This can cause a strange hang issue > + * or sometimes a page fault. Flush hbus->wq before we exit from > + * hv_pci_query_relations() to avoid the issues. > + */ > + flush_workqueue(hbus->wq); > + > return ret; > } > > -- > 2.25.1
Powered by blists - more mailing lists