[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c841575-1e02-32f2-b63d-52bc0c063c82@lwfinger.net>
Date: Wed, 22 Mar 2023 20:41:18 -0500
From: Larry Finger <Larry.Finger@...inger.net>
To: Ping-Ke Shih <pkshih@...ltek.com>,
Jonas Gorski <jonas.gorski@...il.com>
Cc: Hyeonggon Yoo <42.hyeyoo@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-wireless@...r.kernel.org" <linux-wireless@...r.kernel.org>
Subject: Re: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup
On 3/22/23 19:59, Ping-Ke Shih wrote:
>
>
>> -----Original Message-----
>> From: Jonas Gorski <jonas.gorski@...il.com>
>> Sent: Thursday, March 23, 2023 4:52 AM
>> To: Larry Finger <Larry.Finger@...inger.net>
>> Cc: Hyeonggon Yoo <42.hyeyoo@...il.com>; netdev@...r.kernel.org; linux-wireless@...r.kernel.org; Ping-Ke
>> Shih <pkshih@...ltek.com>
>> Subject: Re: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup
>>
>> On Wed, 22 Mar 2023 at 18:03, Larry Finger <Larry.Finger@...inger.net> wrote:
>>>
>>> On 3/22/23 10:54, Hyeonggon Yoo wrote:
>>>>
>>>> Hello folks,
>>>> I've just encountered weird bug when booting Linux v6.2.7
>>>>
>>>> config: attached
>>>> dmesg: attached
>>>>
>>>> I'm not sure exactly how to trigger this issue yet because it's not
>>>> stably reproducible. (just have encountered randomly when logging in)
>>>>
>>>> At quick look it seems to be related to rtw89 wireless driver or network subsystem.
>>>
>>> Your bug is weird indeed, and it does come from rtw89_8852be. My distro has not
>>> yet released kernel 6.2.7, but I have not seen this problem with mainline
>>> kernels throughout the 6.2 or 6.3 development series.
>>
>> Looking at the rtw89 driver's probe function, the bug is probably a
>> simple race condition:
>>
>> int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>> {
>> ...
>> ret = rtw89_core_register(rtwdev); <- calls ieee80211_register_hw();
>> ...
>> rtw89_core_napi_init(rtwdev);
>> ...
>> }
>>
>> so it registers the wifi device first, making it visible to userspace,
>> and then initializes napi.
>>
>> So there is a window where a fast userspace may already try to
>> interact with the device before the driver got around to initializing
>> the napi parts, and then it explodes. At least that is my theory for
>> the issue.
>>
>> Switching the order of these two functions should avoid it in theory,
>> as long as rtw89_core_napi_init() doesn't depend on anything
>> rtw89_core_register() does.
>>
>> FWIW, registering the irq handler only after registering the device
>> also seems suspect, and should probably also happen before that.
>
> Adding a 10 seconds sleep between rtw89_core_register() and
> rtw89_core_napi_init(), and I can reproduce this issue:
>
> int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> ...
> ret = rtw89_core_register(rtwdev);
> ...
> msleep(10 * 100);
> ...
> rtw89_core_napi_init(rtwdev);
> ...
> }
>
> And, as your suggestion, I move the rtw89_core_register() to the last step
> of PCI probe. Then, it looks more reasonable that we prepare NAPI and
> interrupt handlers before registering netdev. Could you give it a try with
> below fix?
>
> diff --git a/pci.c b/pci.c
> index fe6c0efc..087de2e0 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -3879,25 +3879,26 @@ int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> rtw89_pci_link_cfg(rtwdev);
> rtw89_pci_l1ss_cfg(rtwdev);
>
> - ret = rtw89_core_register(rtwdev);
> - if (ret) {
> - rtw89_err(rtwdev, "failed to register core\n");
> - goto err_clear_resource;
> - }
> -
> rtw89_core_napi_init(rtwdev);
>
> ret = rtw89_pci_request_irq(rtwdev, pdev);
> if (ret) {
> rtw89_err(rtwdev, "failed to request pci irq\n");
> - goto err_unregister;
> + goto err_deinit_napi;
> + }
> +
> + ret = rtw89_core_register(rtwdev);
> + if (ret) {
> + rtw89_err(rtwdev, "failed to register core\n");
> + goto err_free_irq;
> }
>
> return 0;
>
> -err_unregister:
> +err_free_irq:
> + rtw89_pci_free_irq(rtwdev, pdev);
> +err_deinit_napi:
> rtw89_core_napi_deinit(rtwdev);
> - rtw89_core_unregister(rtwdev);
> err_clear_resource:
> rtw89_pci_clear_resource(rtwdev, pdev);
> err_declaim_pci:
Ping-Ke,
The patch works fine here, but I did not have the problem.
When you submit it, add a Tested-by: Larry Finger<Larry.Finger@...inger.net> and
a Reviewed-by for the same address.
@Jonas: Good catch.
Larry
Powered by blists - more mailing lists