lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 22 Jan 2021 13:11:07 +0100
From:   Marek Szyprowski <m.szyprowski@...sung.com>
To:     Johannes Berg <johannes@...solutions.net>,
        linux-wireless@...r.kernel.org
Cc:     netdev@...r.kernel.org, Oliver Neukum <oneukum@...e.com>,
        Johannes Berg <johannes.berg@...el.com>
Subject: Re: [PATCH v2] cfg80211: avoid holding the RTNL when calling the
 driver

Hi Johannes,

On 19.01.2021 10:21, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@...el.com>
>
> Currently, _everything_ in cfg80211 holds the RTNL, and if you
> have a slow USB device (or a few) you can get some bad lock
> contention on that.
>
> Fix that by re-adding a mutex to each wiphy/rdev as we had at
> some point, so we have locking for the wireless_dev lists and
> all the other things in there, and also so that drivers still
> don't have to worry too much about it (they still won't get
> parallel calls for a single device).
>
> Then, we can restrict the RTNL to a few cases where we add or
> remove interfaces and really need the added protection. Some
> of the global list management still also uses the RTNL, since
> we need to have it anyway for netdev management, but we only
> hold the RTNL for very short periods of time here.
>
> Signed-off-by: Johannes Berg <johannes.berg@...el.com>

This patch landed in today's (20210122) linux-next as commit 
791daf8fc49a ("cfg80211: avoid holding the RTNL when calling the 
driver"). Sadly, it causes deadlock with mwifiex driver. I think that 
lockdep report describes it enough:

Bluetooth: vendor=0x2df, device=0x912e, class=255, fn=2
cfg80211: Loading compiled-in X.509 certificates for regulatory database
cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Bluetooth: FW download over, size 800344 bytes
btmrvl_sdio mmc2:0001:2: sdio device tree data not available
mwifiex_sdio mmc2:0001:1: WLAN is not the winner! Skip FW dnld
mwifiex_sdio mmc2:0001:1: WLAN FW is active
mwifiex_sdio mmc2:0001:1: CMD_RESP: cmd 0x242 error, result=0x2
mwifiex_sdio mmc2:0001:1: mwifiex_process_cmdresp: cmd 0x242 failed 
during       initialization

============================================
WARNING: possible recursive locking detected
5.11.0-rc4-00535-g791daf8fc49a #2336 Not tainted
--------------------------------------------
kworker/2:3/108 is trying to acquire lock:
c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x2c0/0x49c 
[mwifiex]

but task is already holding lock:
c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x248/0x49c 
[mwifiex]

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&rdev->wiphy.mtx);
   lock(&rdev->wiphy.mtx);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

4 locks held by kworker/2:3/108:
  #0: c1c066a8 ((wq_completion)events){+.+.}-{0:0}, at: 
process_one_work+0x24c/0x888
  #1: deccbf10 ((work_completion)(&fw_work->work)){+.+.}-{0:0}, at: 
process_one_work+0x24c/0x888
  #2: c13202dc (rtnl_mutex){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x23c/0x49c 
[mwifiex]
  #3: c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: 
_mwifiex_fw_dpc+0x248/0x49c [mwifiex]

stack backtrace:
CPU: 2 PID: 108 Comm: kworker/2:3 Not tainted 
5.11.0-rc4-00535-g791daf8fc49a #2336
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events request_firmware_work_func
[<c01116e8>] (unwind_backtrace) from [<c010cf58>] (show_stack+0x10/0x14)
[<c010cf58>] (show_stack) from [<c0b3ad3c>] (dump_stack+0xa4/0xc4)
[<c0b3ad3c>] (dump_stack) from [<c0195fd8>] (__lock_acquire+0xc20/0x31cc)
[<c0195fd8>] (__lock_acquire) from [<c019923c>] (lock_acquire+0x2e4/0x5dc)
[<c019923c>] (lock_acquire) from [<c0b4217c>] (__mutex_lock+0xa4/0xb60)
[<c0b4217c>] (__mutex_lock) from [<c0b42c54>] (mutex_lock_nested+0x1c/0x24)
[<c0b42c54>] (mutex_lock_nested) from [<bf1c87f8>] 
(_mwifiex_fw_dpc+0x2c0/0x49c [mwifiex])
[<bf1c87f8>] (_mwifiex_fw_dpc [mwifiex]) from [<c06bfd18>] 
(request_firmware_work_func+0x58/0x94)
[<c06bfd18>] (request_firmware_work_func) from [<c0149d48>] 
(process_one_work+0x30c/0x888)
[<c0149d48>] (process_one_work) from [<c014a31c>] (worker_thread+0x58/0x594)
[<c014a31c>] (worker_thread) from [<c0151284>] (kthread+0x154/0x19c)
[<c0151284>] (kthread) from [<c010011c>] (ret_from_fork+0x14/0x38)
Exception stack(0xdeccbfb0 to 0xdeccbff8)
...

 > ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ