[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e4f8e55f843041978098f57ecb7e558b@realtek.com>
Date: Thu, 23 Mar 2023 00:59:36 +0000
From: Ping-Ke Shih <pkshih@...ltek.com>
To: Jonas Gorski <jonas.gorski@...il.com>,
Larry Finger <Larry.Finger@...inger.net>
CC: Hyeonggon Yoo <42.hyeyoo@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-wireless@...r.kernel.org" <linux-wireless@...r.kernel.org>
Subject: RE: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup
> -----Original Message-----
> From: Jonas Gorski <jonas.gorski@...il.com>
> Sent: Thursday, March 23, 2023 4:52 AM
> To: Larry Finger <Larry.Finger@...inger.net>
> Cc: Hyeonggon Yoo <42.hyeyoo@...il.com>; netdev@...r.kernel.org; linux-wireless@...r.kernel.org; Ping-Ke
> Shih <pkshih@...ltek.com>
> Subject: Re: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup
>
> On Wed, 22 Mar 2023 at 18:03, Larry Finger <Larry.Finger@...inger.net> wrote:
> >
> > On 3/22/23 10:54, Hyeonggon Yoo wrote:
> > >
> > > Hello folks,
> > > I've just encountered weird bug when booting Linux v6.2.7
> > >
> > > config: attached
> > > dmesg: attached
> > >
> > > I'm not sure exactly how to trigger this issue yet because it's not
> > > stably reproducible. (just have encountered randomly when logging in)
> > >
> > > At quick look it seems to be related to rtw89 wireless driver or network subsystem.
> >
> > Your bug is weird indeed, and it does come from rtw89_8852be. My distro has not
> > yet released kernel 6.2.7, but I have not seen this problem with mainline
> > kernels throughout the 6.2 or 6.3 development series.
>
> Looking at the rtw89 driver's probe function, the bug is probably a
> simple race condition:
>
> int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> ...
> ret = rtw89_core_register(rtwdev); <- calls ieee80211_register_hw();
> ...
> rtw89_core_napi_init(rtwdev);
> ...
> }
>
> so it registers the wifi device first, making it visible to userspace,
> and then initializes napi.
>
> So there is a window where a fast userspace may already try to
> interact with the device before the driver got around to initializing
> the napi parts, and then it explodes. At least that is my theory for
> the issue.
>
> Switching the order of these two functions should avoid it in theory,
> as long as rtw89_core_napi_init() doesn't depend on anything
> rtw89_core_register() does.
>
> FWIW, registering the irq handler only after registering the device
> also seems suspect, and should probably also happen before that.
Adding a 10 seconds sleep between rtw89_core_register() and
rtw89_core_napi_init(), and I can reproduce this issue:
int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
...
ret = rtw89_core_register(rtwdev);
...
msleep(10 * 100);
...
rtw89_core_napi_init(rtwdev);
...
}
And, as your suggestion, I move the rtw89_core_register() to the last step
of PCI probe. Then, it looks more reasonable that we prepare NAPI and
interrupt handlers before registering netdev. Could you give it a try with
below fix?
diff --git a/pci.c b/pci.c
index fe6c0efc..087de2e0 100644
--- a/pci.c
+++ b/pci.c
@@ -3879,25 +3879,26 @@ int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
rtw89_pci_link_cfg(rtwdev);
rtw89_pci_l1ss_cfg(rtwdev);
- ret = rtw89_core_register(rtwdev);
- if (ret) {
- rtw89_err(rtwdev, "failed to register core\n");
- goto err_clear_resource;
- }
-
rtw89_core_napi_init(rtwdev);
ret = rtw89_pci_request_irq(rtwdev, pdev);
if (ret) {
rtw89_err(rtwdev, "failed to request pci irq\n");
- goto err_unregister;
+ goto err_deinit_napi;
+ }
+
+ ret = rtw89_core_register(rtwdev);
+ if (ret) {
+ rtw89_err(rtwdev, "failed to register core\n");
+ goto err_free_irq;
}
return 0;
-err_unregister:
+err_free_irq:
+ rtw89_pci_free_irq(rtwdev, pdev);
+err_deinit_napi:
rtw89_core_napi_deinit(rtwdev);
- rtw89_core_unregister(rtwdev);
err_clear_resource:
rtw89_pci_clear_resource(rtwdev, pdev);
err_declaim_pci:
--
Ping-Ke
Powered by blists - more mailing lists