lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e4f8e55f843041978098f57ecb7e558b@realtek.com>
Date:   Thu, 23 Mar 2023 00:59:36 +0000
From:   Ping-Ke Shih <pkshih@...ltek.com>
To:     Jonas Gorski <jonas.gorski@...il.com>,
        Larry Finger <Larry.Finger@...inger.net>
CC:     Hyeonggon Yoo <42.hyeyoo@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-wireless@...r.kernel.org" <linux-wireless@...r.kernel.org>
Subject: RE: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup



> -----Original Message-----
> From: Jonas Gorski <jonas.gorski@...il.com>
> Sent: Thursday, March 23, 2023 4:52 AM
> To: Larry Finger <Larry.Finger@...inger.net>
> Cc: Hyeonggon Yoo <42.hyeyoo@...il.com>; netdev@...r.kernel.org; linux-wireless@...r.kernel.org; Ping-Ke
> Shih <pkshih@...ltek.com>
> Subject: Re: [BUG v6.2.7] Hitting BUG_ON() on rtw89 wireless driver startup
> 
> On Wed, 22 Mar 2023 at 18:03, Larry Finger <Larry.Finger@...inger.net> wrote:
> >
> > On 3/22/23 10:54, Hyeonggon Yoo wrote:
> > >
> > > Hello folks,
> > > I've just encountered weird bug when booting Linux v6.2.7
> > >
> > > config: attached
> > > dmesg: attached
> > >
> > > I'm not sure exactly how to trigger this issue yet because it's not
> > > stably reproducible. (just have encountered randomly when logging in)
> > >
> > > At quick look it seems to be related to rtw89 wireless driver or network subsystem.
> >
> > Your bug is weird indeed, and it does come from rtw89_8852be. My distro has not
> > yet released kernel 6.2.7, but I have not seen this problem with mainline
> > kernels throughout the 6.2 or 6.3 development series.
> 
> Looking at the rtw89 driver's probe function, the bug is probably a
> simple race condition:
> 
> int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
>     ...
>     ret = rtw89_core_register(rtwdev); <- calls ieee80211_register_hw();
>     ...
>     rtw89_core_napi_init(rtwdev);
>     ...
> }
> 
> so it registers the wifi device first, making it visible to userspace,
> and then initializes napi.
> 
> So there is a window where a fast userspace may already try to
> interact with the device before the driver got around to initializing
> the napi parts, and then it explodes. At least that is my theory for
> the issue.
> 
> Switching the order of these two functions should avoid it in theory,
> as long as rtw89_core_napi_init() doesn't depend on anything
> rtw89_core_register() does.
> 
> FWIW, registering the irq handler only after registering the device
> also seems suspect, and should probably also happen before that.

Adding a 10 seconds sleep between rtw89_core_register() and
rtw89_core_napi_init(), and I can reproduce this issue:

int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
    ...
    ret = rtw89_core_register(rtwdev);
    ...
	msleep(10 * 100);
	...
    rtw89_core_napi_init(rtwdev);
    ...
}

And, as your suggestion, I move the rtw89_core_register() to the last step
of PCI probe. Then, it looks more reasonable that we prepare NAPI and
interrupt handlers before registering netdev. Could you give it a try with
below fix?

diff --git a/pci.c b/pci.c
index fe6c0efc..087de2e0 100644
--- a/pci.c
+++ b/pci.c
@@ -3879,25 +3879,26 @@ int rtw89_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
        rtw89_pci_link_cfg(rtwdev);
        rtw89_pci_l1ss_cfg(rtwdev);

-       ret = rtw89_core_register(rtwdev);
-       if (ret) {
-               rtw89_err(rtwdev, "failed to register core\n");
-               goto err_clear_resource;
-       }
-
        rtw89_core_napi_init(rtwdev);

        ret = rtw89_pci_request_irq(rtwdev, pdev);
        if (ret) {
                rtw89_err(rtwdev, "failed to request pci irq\n");
-               goto err_unregister;
+               goto err_deinit_napi;
+       }
+
+       ret = rtw89_core_register(rtwdev);
+       if (ret) {
+               rtw89_err(rtwdev, "failed to register core\n");
+               goto err_free_irq;
        }

        return 0;

-err_unregister:
+err_free_irq:
+       rtw89_pci_free_irq(rtwdev, pdev);
+err_deinit_napi:
        rtw89_core_napi_deinit(rtwdev);
-       rtw89_core_unregister(rtwdev);
 err_clear_resource:
        rtw89_pci_clear_resource(rtwdev, pdev);
 err_declaim_pci:

--
Ping-Ke


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ