[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190308023401.GA121759@google.com>
Date: Thu, 7 Mar 2019 18:34:03 -0800
From: Brian Norris <briannorris@...omium.org>
To: Ganapathi Bhat <gbhat@...vell.com>
Cc: linux-kernel@...r.kernel.org,
Amitkumar Karwar <amitkarwar@...il.com>,
Nishant Sarmukadam <nishants@...vell.com>,
Ganapathi Bhat <gbhat@...vell.com>,
Xinming Hu <huxinming820@...il.com>,
linux-wireless@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [4.20 PATCH] Revert "mwifiex: restructure rx_reorder_tbl_lock
usage"
Hi again Ganapathi,
By the way, I was a little curious about what went wrong here, so I dug
in a little further:
On Fri, Nov 30, 2018 at 09:59:57AM -0800, Brian Norris wrote:
> This reverts commit 5188d5453bc9380ccd4ae1086138dd485d13aef2, because it
> introduced lock recursion:
>
> BUG: spinlock recursion on CPU#2, kworker/u13:1/395
> lock: 0xffffffc0e28a47f0, .magic: dead4ead, .owner: kworker/u13:1/395, .owner_cpu: 2
> CPU: 2 PID: 395 Comm: kworker/u13:1 Not tainted 4.20.0-rc4+ #2
> Hardware name: Google Kevin (DT)
> Workqueue: MWIFIEX_RX_WORK_QUEUE mwifiex_rx_work_queue [mwifiex]
> Call trace:
> dump_backtrace+0x0/0x140
> show_stack+0x20/0x28
> dump_stack+0x84/0xa4
> spin_bug+0x98/0xa4
> do_raw_spin_lock+0x5c/0xdc
> _raw_spin_lock_irqsave+0x38/0x48
> mwifiex_flush_data+0x2c/0xa4 [mwifiex]
> call_timer_fn+0xcc/0x1c4
> run_timer_softirq+0x264/0x4f0
> __do_softirq+0x1a8/0x35c
> do_softirq+0x54/0x64
> netif_rx_ni+0xe8/0x120
> mwifiex_recv_packet+0xfc/0x10c [mwifiex]
> mwifiex_process_rx_packet+0x1d4/0x238 [mwifiex]
> mwifiex_11n_dispatch_pkt+0x190/0x1ac [mwifiex]
> mwifiex_11n_rx_reorder_pkt+0x28c/0x354 [mwifiex]
TL;DR: the problem was right here ^^^
where you started running mwifiex_11n_dispatch_pkt() (via
mwifiex_11n_scan_and_dispatch()) while holding a spinlock.
When you do that, you eventually call netif_rx_ni(), which specifically
defers to softirq contexts. Then, if you happen to have your flush timer
expiring just before that, you end up in mwifiex_flush_data(), which
also needs that spinlock.
There are a few possible ways to handle this:
(a) prevent processing softirqs in that context; e.g., with
local_bh_disable(). This seems somewhat of a hack.
(Side note: I think most of the locks in this driver really could be
spin_lock_bh(), not spin_lock_irqsave() -- we don't really care
about hardirq context for 99% of these locks.)
(b) restructure so that packet processing (e.g., netif_rx_ni()) is done
outside of the spinlock.
It's actually not that hard to do (b). You can just queue your skb's up
in a temporary sk_buff_head list and process them all at once after
you've finished processing the reorder table. I have a local patch to do
this, and I might send it your way if I can give it a bit more testing.
Brian
> mwifiex_process_sta_rx_packet+0x204/0x26c [mwifiex]
> mwifiex_handle_rx_packet+0x15c/0x16c [mwifiex]
> mwifiex_rx_work_queue+0x104/0x134 [mwifiex]
> worker_thread+0x4cc/0x72c
> kthread+0x134/0x13c
> ret_from_fork+0x10/0x18
>
> This was clearly not tested well at all. I simply performed 'wget' in a
> loop and it fell over within a few seconds.
>
> Fixes: 5188d5453bc9 ("mwifiex: restructure rx_reorder_tbl_lock usage")
> Cc: <stable@...r.kernel.org>
> Cc: Ganapathi Bhat <gbhat@...vell.com>
> Signed-off-by: Brian Norris <briannorris@...omium.org>
Powered by blists - more mailing lists