lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c841575-1e02-32f2-b63d-52bc0c063c82@lwfinger.net>
Date:   Wed, 22 Mar 2023 20:41:18 -0500
From:   Larry Finger <Larry.Finger@...inger.net>
To:     Ping-Ke Shih <pkshih@...ltek.com>,
        Jonas Gorski <jonas.gorski@...il.com>
Cc:     Hyeonggon Yoo <42.hyeyoo@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-wireless@...r.kernel.org" <linux-wireless@...r.kernel.org>
Subject: Re: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup

On 3/22/23 19:59, Ping-Ke Shih wrote:
> 
> 
>> -----Original Message-----
>> From: Jonas Gorski <jonas.gorski@...il.com>
>> Sent: Thursday, March 23, 2023 4:52 AM
>> To: Larry Finger <Larry.Finger@...inger.net>
>> Cc: Hyeonggon Yoo <42.hyeyoo@...il.com>; netdev@...r.kernel.org; linux-wireless@...r.kernel.org; Ping-Ke
>> Shih <pkshih@...ltek.com>
>> Subject: Re: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup
>>
>> On Wed, 22 Mar 2023 at 18:03, Larry Finger <Larry.Finger@...inger.net> wrote:
>>>
>>> On 3/22/23 10:54, Hyeonggon Yoo wrote:
>>>>
>>>> Hello folks,
>>>> I've just encountered weird bug when booting Linux v6.2.7
>>>>
>>>> config: attached
>>>> dmesg: attached
>>>>
>>>> I'm not sure exactly how to trigger this issue yet because it's not
>>>> stably reproducible. (just have encountered randomly when logging in)
>>>>
>>>> At quick look it seems to be related to rtw89 wireless driver or network subsystem.
>>>
>>> Your bug is weird indeed, and it does come from rtw89_8852be. My distro has not
>>> yet released kernel 6.2.7, but I have not seen this problem with mainline
>>> kernels throughout the 6.2 or 6.3 development series.
>>
>> Looking at the rtw89 driver's probe function, the bug is probably a
>> simple race condition:
>>
>> int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>> {
>>      ...
>>      ret = rtw89_core_register(rtwdev); <- calls ieee80211_register_hw();
>>      ...
>>      rtw89_core_napi_init(rtwdev);
>>      ...
>> }
>>
>> so it registers the wifi device first, making it visible to userspace,
>> and then initializes napi.
>>
>> So there is a window where a fast userspace may already try to
>> interact with the device before the driver got around to initializing
>> the napi parts, and then it explodes. At least that is my theory for
>> the issue.
>>
>> Switching the order of these two functions should avoid it in theory,
>> as long as rtw89_core_napi_init() doesn't depend on anything
>> rtw89_core_register() does.
>>
>> FWIW, registering the irq handler only after registering the device
>> also seems suspect, and should probably also happen before that.
> 
> Adding a 10 seconds sleep between rtw89_core_register() and
> rtw89_core_napi_init(), and I can reproduce this issue:
> 
> int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
>      ...
>      ret = rtw89_core_register(rtwdev);
>      ...
> 	msleep(10 * 100);
> 	...
>      rtw89_core_napi_init(rtwdev);
>      ...
> }
> 
> And, as your suggestion, I move the rtw89_core_register() to the last step
> of PCI probe. Then, it looks more reasonable that we prepare NAPI and
> interrupt handlers before registering netdev. Could you give it a try with
> below fix?
> 
> diff --git a/pci.c b/pci.c
> index fe6c0efc..087de2e0 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -3879,25 +3879,26 @@ int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>          rtw89_pci_link_cfg(rtwdev);
>          rtw89_pci_l1ss_cfg(rtwdev);
> 
> -       ret = rtw89_core_register(rtwdev);
> -       if (ret) {
> -               rtw89_err(rtwdev, "failed to register core\n");
> -               goto err_clear_resource;
> -       }
> -
>          rtw89_core_napi_init(rtwdev);
> 
>          ret = rtw89_pci_request_irq(rtwdev, pdev);
>          if (ret) {
>                  rtw89_err(rtwdev, "failed to request pci irq\n");
> -               goto err_unregister;
> +               goto err_deinit_napi;
> +       }
> +
> +       ret = rtw89_core_register(rtwdev);
> +       if (ret) {
> +               rtw89_err(rtwdev, "failed to register core\n");
> +               goto err_free_irq;
>          }
> 
>          return 0;
> 
> -err_unregister:
> +err_free_irq:
> +       rtw89_pci_free_irq(rtwdev, pdev);
> +err_deinit_napi:
>          rtw89_core_napi_deinit(rtwdev);
> -       rtw89_core_unregister(rtwdev);
>   err_clear_resource:
>          rtw89_pci_clear_resource(rtwdev, pdev);
>   err_declaim_pci:

Ping-Ke,

The patch works fine here, but I did not have the problem.

When you submit it, add a Tested-by: Larry Finger<Larry.Finger@...inger.net> and 
a Reviewed-by for the same address.

@Jonas: Good catch.

Larry


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ