lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3aaea04ef3ec4a7287bfd5962ce57cc2@realtek.com>
Date: Thu, 11 Sep 2025 02:46:19 +0000
From: Ping-Ke Shih <pkshih@...ltek.com>
To: Ondřej Jirman <megi@....cz>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "open
 list:REALTEK WIRELESS DRIVER (rtw89)" <linux-wireless@...r.kernel.org>
Subject: RE: [PATCH] net: wireless: rtw89: Sleep while waiting for firmware init

Ondřej Jirman <megi@....cz> wrote:
> On Wed, Sep 10, 2025 at 12:37:54AM +0000, Ping-Ke Shih wrote:
> > Ondřej Jirman <megi@....cz> wrote:
> > > From: Ondrej Jirman <megi@....cz>
> > >
> > > This avoids RCU stalls caused by waiting up to 400ms for firmware init.
> > >
> > > Signed-off-by: Ondrej Jirman <megi@....cz>
> > > ---
> > >  drivers/net/wireless/realtek/rtw89/fw.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/wireless/realtek/rtw89/fw.c b/drivers/net/wireless/realtek/rtw89/fw.c
> > > index 16e59a4a486e..2c034b764a0a 100644
> > > --- a/drivers/net/wireless/realtek/rtw89/fw.c
> > > +++ b/drivers/net/wireless/realtek/rtw89/fw.c
> > > @@ -109,9 +109,9 @@ int rtw89_fw_check_rdy(struct rtw89_dev *rtwdev, enum rtw89_fwdl_check_type type
> > >         u8 val;
> > >         int ret;
> > >
> > > -       ret = read_poll_timeout_atomic(mac->fwdl_get_status, val,
> > > -                                      val == RTW89_FWDL_WCPU_FW_INIT_RDY,
> > > -                                      1, FWDL_WAIT_CNT, false, rtwdev, type);
> > > +       ret = read_poll_timeout(mac->fwdl_get_status, val,
> > > +                               val == RTW89_FWDL_WCPU_FW_INIT_RDY,
> > > +                               1, FWDL_WAIT_CNT, false, rtwdev, type);
> >
> > As I know, sleeping while RCU lock is not allowed. Please share kernel log
> > about the RCU stall and your perspective.
> 
> My perspective is that busy looping via read_poll_timeout_atomic() on a CPU for
> up to 400ms (FWDL_WAIT_CNT is 400_000) is not optimal

The polling time just comes from vendor driver, but actually it spends about
5ms in my PC. I think maximum polling time 400ms will not happen for users (
possibly happen during development), so it is fine.

> and that rtw driver is for
> some reason a major user of *_atomic version of read_poll function, which is
> I guess fine if you just need to wait for a few microseconds, and want
> a predictably timed exit from polling. But firmware load ready check doesn't
> seem to be this case.
> 
> Does the driver really need exit from polling the moment firmware signals ready
> status here (and similarly in many other places)?
> 
> More than 50% of uses of this function across all of kernel comes from rtw
> driver:
> 
>   https://elixir.bootlin.com/linux/v6.16.5/C/ident/read_poll_timeout_atomic

Just because Realtek WiFi does IO in driver, so need polling hardware ready. 
Using atomic version is because small delay interval (1us) is adopted in this
case.

> 
> My understanding is that in non-preemtible kernel this will prevent anything
> from executing on said CPU for up to that amount of time, which is what the
> kernel complains about (6.16.6):

It seems like a thread should call might_sleep functions to schedule CPU to
prevent that RCU detected CPU stall? If so, this patch seems to be a random
choice with higher polling time.

More, does it means the warning of RCU stall disappears if you turn on 
preemtible kernel?

> 
> [    8.028997] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 1-.... } 6 jiffies s: 709
> root: 0x2/.
> [    8.029819] rcu: blocking rcu_node structures (internal RCU debug):
> [    8.030261] Sending NMI from CPU 3 to CPUs 1:
> [    8.030280] NMI backtrace for cpu 1
> [    8.030295] CPU: 1 UID: 0 PID: 366 Comm: hostapd Not tainted 6.16.6-00531-gefcc09919cb9 #17 VOLUNTARY
> [    8.030304] Hardware name: Pine64 Quartz64 Model A (DT)
> [    8.030307] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    8.030313] pc : arch_counter_get_cntpct+0x8/0x20
> [    8.030326] lr : __delay+0x90/0xe8
> [    8.030332] sp : ffff8000859c34a0
> [    8.030334] x29: ffff8000859c34a0 x28: ffff0001074de180 x27: ffff000100117080
> [    8.030342] x26: ffff0001042003c0 x25: ffff0001042024e8 x24: 0000000000000000
> [    8.030347] x23: 0000000000000000 x22: ffffffffec859f81 x21: 00000000137a69df
> [    8.030352] x20: 0000000000000018 x19: ffff800082af7158 x18: 00000000000007f4
> [    8.030357] x17: 0000000800000000 x16: 41003d382021b000 x15: ffff000100f68088
> [    8.030362] x14: 0000000000000000 x13: 0000000000000000 x12: 00000000000003d0
> [    8.030367] x11: 0000000000000001 x10: ffff000100f680c8 x9 : 000000013c624a80
> [    8.030372] x8 : ffff00013c624a90 x7 : 0000000000000000 x6 : 000000013c624a80
> [    8.030377] x5 : ffff800080ddeae0 x4 : ffff00010420dff4 x3 : 00000000ffff41e0
> [    8.030381] x2 : 0000000000000001 x1 : ffff8000859c34a0 x0 : 00000000137a69ed
> [    8.030388] Call trace:
> [    8.030390]  arch_counter_get_cntpct+0x8/0x20 (P)
> [    8.030397]  __const_udelay+0x28/0x38
> [    8.030402]  rtw89_fw_check_rdy+0x3c/0x134
> [    8.030412]  rtw89_fw_download+0x1b4/0x1d0
> [    8.030417]  rtw89_mac_partial_init+0xcc/0x158
> [    8.030424]  rtw89_mac_init+0x40/0x16c
> [    8.030429]  rtw89_core_start+0x14/0x268
> [    8.030433]  rtw89_leave_ips+0x24/0xe8
> [    8.030440]  rtw89_ops_add_interface+0x1c4/0x244
> [    8.030445]  rtw89_ops_change_interface+0x84/0x120
> [    8.030449]  drv_change_interface+0x58/0xd0
> [    8.030457]  ieee80211_if_change_type+0x198/0x3c8
> [    8.030465]  ieee80211_change_iface+0x38/0x20c
> [    8.030471]  cfg80211_change_iface+0x170/0x2bc
> [    8.030477]  nl80211_set_interface+0x248/0x274
> [    8.030484]  genl_family_rcv_msg_doit+0xb8/0x110
> [    8.030494]  genl_rcv_msg+0x1ac/0x244
> [    8.030499]  netlink_rcv_skb+0x44/0xf8
> [    8.030504]  genl_rcv+0x30/0x44
> [    8.030509]  netlink_unicast+0x2d8/0x34c
> [    8.030514]  netlink_sendmsg+0x170/0x3d8
> [    8.030518]  __sock_sendmsg+0x48/0x54
> [    8.030526]  ____sys_sendmsg+0x24c/0x264
> [    8.030531]  ___sys_sendmsg+0x6c/0xb0
> [    8.030536]  __sys_sendmsg+0x6c/0xb8
> [    8.030540]  __arm64_sys_sendmsg+0x1c/0x24
> [    8.030545]  invoke_syscall.constprop.0+0x3c/0xe4
> [    8.030553]  el0_svc_common.constprop.0+0x34/0xcc
> [    8.030559]  do_el0_svc+0x18/0x20
> [    8.030564]  el0_svc+0x20/0x5c
> [    8.030570]  el0t_64_sync_handler+0x104/0x130
> [    8.030574]  el0t_64_sync+0x154/0x158
> 
> The kernel build has CONFIG_HZ_250=y so 6 jiffies is 24 ms. Not a lot, not
> a small amount either.
> 
> I also get this on 6.17-rc5:
> 
> Seems to come from
> https://elixir.bootlin.com/linux/v6.16.5/source/drivers/net/wireless/realtek/rtw89/efuse.c#L174
> 
> Which is another place that seemingly doesn't need udelay() based polling,
> because the function is only called from driver probe.

Yes, it looks like that.

My concern is that the meaning of polling timeout between 
read_poll_timeout_atomic() and read_poll_timeout() is difference. The
non-atomic version is wall clock time, but atomic version is total
polling time excluding IO time. However, USB devices spend much more
time than PCIE devices, so the change might cause USB devices regression.

Another, non-atomic version can't work in RCU/spinlock critical section.
We also need to consider this carefully.

> 
> This time it stalled for combined time of 26+6 jiffies, which is 128ms. That's
> a lot (probably a combined time of looping over all efuse addresses, while busy
> polling for each access, without allowing the CPU to schedule other tasks).
> 

[...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ