[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1be4dce841c6e5de076864102aca2e131d1aafe1.camel@gmail.com>
Date: Thu, 26 Sep 2024 21:39:07 +0200
From: Hubert Wiśniewski <hubert.wisniewski.25632@...il.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Ferry Toth
<ftoth@...londelft.nl>, Hardik Gajjar <hgajjar@...adit-jv.com>, Kees Cook
<kees@...nel.org>, Justin Stitt <justinstitt@...gle.com>, Richard Acayan
<mailingradian@...il.com>, Jeff Johnson <quic_jjohnson@...cinc.com>,
"Ricardo B. Marliere" <ricardo@...liere.net>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Toke
Høiland-Jørgensen <toke@...hat.com>,
linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] usb: gadget: u_ether: Use __netif_rx() in
rx_callback()
On Tue, 2024-09-17 at 16:36 +0200, Sebastian Andrzej Siewior wrote:
> On 2024-09-17 15:56:33 [+0200], Hubert Wiśniewski wrote:
> > netif_rx() now disables bottom halves, which causes the USB gadget to be
> > unable to receive frames if the interface is not brought up quickly enough
> > after being created by the driver (a bug confirmed on AM3352 SoC).
> >
> > Replacing netif_rx() with __netif_rx() restores the old behavior and fixes
> > the bug. This can be done since rx_callback() is called from the interrupt
> > context.
> >
> > Fixes: baebdf48c360 ("net: dev: Makes sure netif_rx() can be invoked in any context.")
> > Cc: stable@...r.kernel.org
> > Signed-off-by: Hubert Wiśniewski <hubert.wisniewski.25632@...il.com>
>
> Now that I see v3, my v2 question stands.
> This supposed only to disable BH if invoked from from non-interrupt.
> hardirq and softirq should be good. A backtrace would be nice and further
> explanation how this becomes a problem. Also lockdep should complain at
> some point.
>
> Sebastian
This lockup seems to occur only on AM335x SoCs. If the interface is brought
up immediately after it is created:
# modprobe g_ether; ip link set usb0 up
it works fine. But if there is some delay:
# modprobe g_ether; sleep 5; ip link set usb0 up
the interface is unable to receive any frames from the USB host (but the
outgoing frames are sent correctly).
This becomes a problem when the g_ether module is loaded early in the boot
process or it is built in, and the interface is configured later (e.g. by
ifupdown).
The same thing happens when using configfs to create and configure an
Ethernet gadget.
I have built the kernel with CONFIG_PROVE_LOCKING=y and got the following
messages upon bringing the interface up:
Backtrace (timestamps stripped):
WARNING: CPU: 0 PID: 86 at kernel/softirq.c:362 __local_bh_enable_ip+0x118/0x198
CPU: 0 UID: 0 PID: 86 Comm: ip Not tainted 6.11.0 #3
Hardware name: Generic AM33XX (Flattened Device Tree)
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x68/0x88
dump_stack_lvl from __warn+0x70/0x1ac
__warn from warn_slowpath_fmt+0x12c/0x1c4
warn_slowpath_fmt from __local_bh_enable_ip+0x118/0x198
__local_bh_enable_ip from netif_rx+0x118/0x208
netif_rx from rx_complete+0x140/0x27c
rx_complete from musb_g_giveback+0xf0/0x1d8
musb_g_giveback from musb_ep_restart_resume_work+0x8/0x10
musb_ep_restart_resume_work from musb_gadget_queue+0x1c8/0x498
musb_gadget_queue from usb_ep_queue+0x38/0x134
usb_ep_queue from rx_submit+0xdc/0x1cc
rx_submit from rx_fill+0x7c/0xa0
rx_fill from eth_start+0x30/0x54
eth_start from eth_open+0x40/0x80
eth_open from __dev_open+0xe0/0x174
__dev_open from __dev_change_flags+0x160/0x1d0
__dev_change_flags from dev_change_flags+0x1c/0x58
dev_change_flags from devinet_ioctl+0x6d0/0x884
devinet_ioctl from inet_ioctl+0x1ac/0x2bc
inet_ioctl from sock_ioctl+0x2c0/0x3b8
sock_ioctl from sys_ioctl+0x194/0xf88
sys_ioctl from ret_fast_syscall+0x0/0x1c
Exception stack(0xe1485fa8 to 0xe1485ff0)
5fa0: 000b3630 00000001 00000003 00008914 be802cd8 be802c90
5fc0: 000b3630 00000001 00000003 00000036 ffffffff ffffffff ffffffff be802ed4
5fe0: 000c74cc be802c80 00028000 b6e7d14c
irq event stamp: 2581
hardirqs last enabled at (2579): [<c13e10cc>] _raw_spin_unlock_irqrestore+0x44/0x48
hardirqs last disabled at (2580): [<c13e0e94>] _raw_spin_lock_irqsave+0x64/0x68
softirqs last enabled at (2558): [<c0e727b0>] __dev_change_flags+0x7c/0x1d0
softirqs last disabled at (2581): [<c0e68dd8>] netif_rx+0xa4/0x208
Lockdep:
================================
WARNING: inconsistent lock state
6.11.0 #3 Tainted: G W
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
ip/86 [HC0[0]:SC0[0]:HE1:SE1] takes:
c3e54050 (&musb->lock){?.-.}-{2:2}, at: musb_g_giveback+0xf8/0x1d8
{IN-HARDIRQ-W} state was registered at:
lock_acquire+0x108/0x358
_raw_spin_lock_irqsave+0x4c/0x68
dsps_interrupt+0x28/0x278
__handle_irq_event_percpu+0xa0/0x2f4
handle_irq_event_percpu+0xc/0x40
handle_irq_event+0x38/0xcc
handle_level_irq+0xb4/0x13c
handle_irq_desc+0x1c/0x2c
generic_handle_arch_irq+0x2c/0x64
call_with_stack+0x18/0x20
__irq_svc+0x9c/0xbc
console_flush_all+0x25c/0x5c0
console_unlock+0x80/0x114
vprintk_emit+0x248/0x354
dev_vprintk_emit+0x110/0x144
dev_printk_emit+0x28/0x50
__dev_printk+0x74/0x90
_dev_warn+0x3c/0x68
_regulator_get+0x1f8/0x31c
sdhci_omap_regulator_get_caps+0x8/0x84
sdhci_omap_probe+0x208/0x788
platform_probe+0x58/0xb8
really_probe+0xc4/0x28c
__driver_probe_device+0x84/0x194
driver_probe_device+0x30/0xc8
__device_attach_driver+0xa4/0xe0
bus_for_each_drv+0x80/0xd0
__device_attach_async_helper+0xa8/0xdc
async_run_entry_fn+0x20/0xb4
process_scheduled_works+0x254/0x6e8
worker_thread+0x13c/0x340
kthread+0xf4/0x114
ret_from_fork+0x14/0x24
irq event stamp: 2649
hardirqs last enabled at (2649): [<c13e10cc>] _raw_spin_unlock_irqrestore+0x44/0x48
hardirqs last disabled at (2648): [<c13e0e94>] _raw_spin_lock_irqsave+0x64/0x68
softirqs last enabled at (2644): [<c014b410>] handle_softirqs+0x260/0x45c
softirqs last disabled at (2613): [<c1385f28>] call_with_stack+0x18/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&musb->lock);
<Interrupt>
lock(&musb->lock);
*** DEADLOCK ***
1 lock held by ip/86:
#0: c1d60ca4 (rtnl_mutex){+.+.}-{3:3}, at: devinet_ioctl+0xc4/0x884
stack backtrace:
CPU: 0 UID: 0 PID: 86 Comm: ip Tainted: G W 6.11.0 #3
Tainted: [W]=WARN
Hardware name: Generic AM33XX (Flattened Device Tree)
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x68/0x88
dump_stack_lvl from mark_lock.part.17+0x3cc/0x440
mark_lock.part.17 from __lock_acquire+0x44c/0x22b0
__lock_acquire from lock_acquire+0x108/0x358
lock_acquire from _raw_spin_lock+0x38/0x48
_raw_spin_lock from musb_g_giveback+0xf8/0x1d8
musb_g_giveback from musb_ep_restart_resume_work+0x8/0x10
musb_ep_restart_resume_work from musb_gadget_queue+0x1c8/0x498
musb_gadget_queue from usb_ep_queue+0x38/0x134
usb_ep_queue from rx_submit+0xdc/0x1cc
rx_submit from rx_fill+0x7c/0xa0
rx_fill from eth_start+0x30/0x54
eth_start from eth_open+0x40/0x80
eth_open from __dev_open+0xe0/0x174
__dev_open from __dev_change_flags+0x160/0x1d0
__dev_change_flags from dev_change_flags+0x1c/0x58
dev_change_flags from devinet_ioctl+0x6d0/0x884
devinet_ioctl from inet_ioctl+0x1ac/0x2bc
inet_ioctl from sock_ioctl+0x2c0/0x3b8
sock_ioctl from sys_ioctl+0x194/0xf88
sys_ioctl from ret_fast_syscall+0x0/0x1c
Exception stack(0xe1485fa8 to 0xe1485ff0)
5fa0: 000b3630 00000001 00000003 00008914 be802cd8 be802c90
5fc0: 000b3630 00000001 00000003 00000036 ffffffff ffffffff ffffffff be802ed4
5fe0: 000c74cc be802c80 00028000 b6e7d14c
Surprisingly, with the patch I suggested, there is still a backtrace
generated, but without lockdep warning:
WARNING: CPU: 0 PID: 87 at net/core/dev.c:5207 __netif_rx+0xd4/0x210
CPU: 0 UID: 0 PID: 87 Comm: ip Not tainted 6.11.0-dirty #4
Hardware name: Generic AM33XX (Flattened Device Tree)
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x68/0x88
dump_stack_lvl from __warn+0x70/0x1ac
__warn from warn_slowpath_fmt+0x12c/0x1c4
warn_slowpath_fmt from __netif_rx+0xd4/0x210
__netif_rx from rx_complete+0x140/0x27c
rx_complete from musb_g_giveback+0xf0/0x1d8
musb_g_giveback from musb_ep_restart_resume_work+0x8/0x10
musb_ep_restart_resume_work from musb_gadget_queue+0x1c8/0x498
musb_gadget_queue from usb_ep_queue+0x38/0x134
usb_ep_queue from rx_submit+0xdc/0x1cc
rx_submit from rx_fill+0x7c/0xa0
rx_fill from eth_start+0x30/0x54
eth_start from eth_open+0x40/0x80
eth_open from __dev_open+0xe0/0x174
__dev_open from __dev_change_flags+0x160/0x1d0
__dev_change_flags from dev_change_flags+0x1c/0x58
dev_change_flags from devinet_ioctl+0x6d0/0x884
devinet_ioctl from inet_ioctl+0x1ac/0x2bc
inet_ioctl from sock_ioctl+0x2c0/0x3b8
sock_ioctl from sys_ioctl+0x194/0xf88
sys_ioctl from ret_fast_syscall+0x0/0x1c
Exception stack(0xe148dfa8 to 0xe148dff0)
dfa0: 000b3630 00000001 00000003 00008914 be8c8cd8 be8c8c90
dfc0: 000b3630 00000001 00000003 00000036 ffffffff ffffffff ffffffff be8c8ed4
dfe0: 000c74cc be8c8c80 00028000 b6ed614c
irq event stamp: 2542
hardirqs last enabled at (2541): [<c13e10cc>] _raw_spin_unlock_irqrestore+0x44/0x48
hardirqs last disabled at (2542): [<c13e0e94>] _raw_spin_lock_irqsave+0x64/0x68
softirqs last enabled at (2520): [<c0e727b0>] __dev_change_flags+0x7c/0x1d0
softirqs last disabled at (2518): [<c0e7236c>] dev_set_rx_mode+0x0/0x40
I'm a bit at loss here. The deadlock seems to be unrelated to netif_rx()
(which is not being called in the interrupt context after all), yet
replacing it with __netif_rx() fixes the lockup (though a warning is still
generated, which suggests that the patch does not completely fix the
issue).
--
Hubert Wiśniewski <hubert.wisniewski.25632@...il.com>
Powered by blists - more mailing lists