[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250219181556.1020029-1-ralph.siemsen@linaro.org>
Date: Wed, 19 Feb 2025 13:15:52 -0500
From: Ralph Siemsen <ralph.siemsen@...aro.org>
To: linux-rt-devel@...ts.linux.dev,
linux-usb@...r.kernel.org,
linux-kernel@...r.kernel.org,
Pawel Laszczak <pawell@...ence.com>,
Frank Li <Frank.Li@....com>,
Ferry Toth <ftoth@...londelft.nl>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Thomas Gleixner <tglx@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>,
Ralph Siemsen <ralph.siemsen@...aro.org>
Subject: [RFC PATCH] usb: gadget: u_ether: prevent deadlock under RT
A deadlock can be readily triggered when using NCM gadget with the
Cadence cdns3 USB driver, under heavy traffic from "iperf3 --bidir".
It has been observed under 6.1, 6.6 and 6.12 kernel versions, but
only on PREEMPT_RT. Once deadlocked, even magicsysrq has no effect.
However, there is periodic output from the rcu stall detector:
[ 71.105941] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 71.105966] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P125/4:b..l
[ 71.105992] rcu: (detected by 1, t=5252 jiffies, g=-515, q=7 ncpus=2)
[ 71.106003] task:irq/507-s-f4000 state:D stack:0 pid:125 tgid:125 ppid:2 flags:0x00000008
[ 71.106018] Call trace:
[ 71.106022] __switch_to+0xf4/0x158
[ 71.106046] __schedule+0x2b4/0x920
[ 71.106055] schedule_rtlock+0x24/0x50
[ 71.106064] rtlock_slowlock_locked+0x348/0xcb8
[ 71.106077] rt_spin_lock+0x88/0xb8
[ 71.106086] eth_start_xmit+0x30/0x1490 [u_ether] /*****/
[ 71.106112] ncm_tx_timeout+0x2c/0x50 [usb_f_ncm]
[ 71.106131] __hrtimer_run_queues+0x180/0x378
[ 71.106143] hrtimer_run_softirq+0x90/0x100
[ 71.106151] handle_softirqs.isra.0+0x14c/0x360
[ 71.106165] __local_bh_enable_ip+0x104/0x118
[ 71.106177] __netdev_alloc_skb+0x1e0/0x210
[ 71.106192] ncm_unwrap_ntb+0x1ec/0x528 [usb_f_ncm]
[ 71.106206] rx_complete+0x120/0x288 [u_ether] /*****/
[ 71.106221] usb_gadget_giveback_request+0x34/0xf8
[ 71.106236] cdns3_gadget_giveback+0xe4/0x2d0 [cdns3]
[ 71.106286] cdns3_transfer_completed+0x3b0/0x630 [cdns3]
[ 71.106320] cdns3_device_thread_irq_handler+0x8b8/0xd18 [cdns3]
[ 71.106353] irq_thread_fn+0x34/0xb8
[ 71.106364] irq_thread+0x180/0x2f0
[ 71.106374] kthread+0x104/0x118
[ 71.106384] ret_from_fork+0x10/0x20
The deadlock occurs because eth_start_xmit() and rx_complete() both
acquire the same spinlock in the same instance of struct eth_dev.
The nested call occurs because rx_complete() calls __netdev_alloc_skb()
which performs a brief local_bh_disable/enable() sequence.
Prevent the deadlock by disabling softirq in rx_complete(), with the
same scope as the spinlock. With this fix, no deadlocks are observed
over many hours of continuous testing at USB 2.0 speed (480 Mbit/s).
Signed-off-by: Ralph Siemsen <ralph.siemsen@...aro.org>
---
Discussion items:
0) This is somewhat similar to the recent discussion
https://lore.kernel.org/linux-rt-devel/20250212123302.0f620f23@gandalf.local.home/
and my solution is to "sprinkle local_bh_disable() around",
which is clearly not optimal. Hence the RFC on this patch.
1) There are several more places using the same spinlock, for example
the gether_suspend() and gether_resume() functions. I'm not using power
management, but I wonder if there might be more deadlocks lurking?
2) By keeping softirq disabled for a longer duration, this patch could
potentially increase the RT latency. I tried to quantify this using
cyclictest, but I cannot get a baseline for comparison, since it
deadlocks almost immediately when the USB is active. Other suggestions?
3) The fix is in generic u_ether.c code, to match the scope of the
spinlock. Alternatively, it could be done in cdns3 specific code,
such as in cdns3_device_thread_irq_handler(). I'm not sure if the
same problem exists in other USB drivers?
---
drivers/usb/gadget/function/u_ether.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index 09e2838917e2..dc511c01b741 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -233,6 +233,7 @@ static void rx_complete(struct usb_ep *ep, struct usb_request *req)
if (dev->unwrap) {
unsigned long flags;
+ local_bh_disable();
spin_lock_irqsave(&dev->lock, flags);
if (dev->port_usb) {
status = dev->unwrap(dev->port_usb,
@@ -243,6 +244,7 @@ static void rx_complete(struct usb_ep *ep, struct usb_request *req)
status = -ENOTCONN;
}
spin_unlock_irqrestore(&dev->lock, flags);
+ local_bh_enable();
} else {
skb_queue_tail(&dev->rx_frames, skb);
}
--
2.45.2.121.gc2b3f2b3cd
Powered by blists - more mailing lists