lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+ASDXPXiyga6mKLBacupCXa0wsBbXCrmq20RFo7T2eSF8kbzQ@mail.gmail.com>
Date: Tue, 3 Dec 2024 15:04:56 -0800
From: Brian Norris <briannorris@...omium.org>
To: Doug Anderson <dianders@...omium.org>
Cc: Pin-yen Lin <treapking@...omium.org>, Francesco Dolcini <francesco@...cini.it>, 
	Kalle Valo <kvalo@...nel.org>, David Lin <yu-hao.lin@....com>, linux-kernel@...r.kernel.org, 
	linux-wireless@...r.kernel.org
Subject: Re: [PATCH] wifi: mwifiex: decrease timeout waiting for host sleep
 from 10s to 5s

Hi,

On Wed, Nov 27, 2024 at 7:44 AM Doug Anderson <dianders@...omium.org> wrote:
> On Wed, Nov 27, 2024 at 2:58 AM Pin-yen Lin <treapking@...omium.org> wrote:
> > In commit 52250cbee7f6 ("mwifiex: use timeout variant for
> > wait_event_interruptible") it was noted that sometimes we seemed
> > to miss the signal that our host sleep settings took effect. A
> > 10 second timeout was added to the code to make sure we didn't
> > hang forever waiting. It appears that this problem still exists
> > and we hit the timeout sometimes for Chromebooks in the field.
> >
> > Recently on ChromeOS we've started setting the DPM watchdog
> > to trip if full system suspend takes over 10 seconds. Given
> > the timeout in the original patch, obviously we're hitting
> > the DPM watchdog before mwifiex gets a chance to timeout.

The Kconfig default is 120 seconds, and it's hidden under
CONFIG_EXPERT. What makes you think 10 is a good value? (It sounds
pretty small for triggering panics.) The smallest I can see outside of
ChromeOS is 12 seconds, although 60 seconds is much more common. I
also happen to see other WiFi drivers have hit similar problems, but
they settled on 20 seconds (assuming a 60s timeout on other distros):
https://lore.kernel.org/linux-wireless/20230329162038.8637-1-kvalo@kernel.org/
https://git.kernel.org/linus/cf5fa3ca0552f1b7ba8490de40700bbfb6979b17

Technically, this Kconfig lets you set a value as small as 1 second. I
don't think we should work at reducing all timeouts to fit that
target.

I could maybe approve lowering this one, but I'd also recommend
reconsidering your timeout value. And more questions below.

> > While we could increase the DPM watchdog in ChromeOS to avoid
> > this problem, it's probably better to simply decrease the
> > timeout. Any time we're waiting several seconds for the
> > firmware to respond it's likely that the firmware won't ever
> > respond. With that in mind, decrease the timeout in mwifiex
> > from 10 seconds to 5 seconds.
> >
> > Suggested-by: Doug Anderson <dianders@...omium.org>
> > Signed-off-by: Pin-yen Lin <treapking@...omium.org>
> > ---
> >
> >  drivers/net/wireless/marvell/mwifiex/sta_ioctl.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
>
> I believe Brian Norris is still anointed as the personally nominally
> in charge of mwifiex upstream (get_maintainer labels him as "odd"
> fixer, which seems awfully judgemental), so he should be CCed on
> fixes. Added him.

I tend to see mwifiex patches even if I don't get CC'd, but thanks. I
wonder why get_maintainer(?) picked up Francesco properly but not me.
*shrug*

> > diff --git a/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c b/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
> > index e06a0622973e..f79589cafe57 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
> > @@ -545,7 +545,7 @@ int mwifiex_enable_hs(struct mwifiex_adapter *adapter)
> >
> >         if (wait_event_interruptible_timeout(adapter->hs_activate_wait_q,
> >                                              adapter->hs_activate_wait_q_woken,
> > -                                            (10 * HZ)) <= 0) {
> > +                                            (5 * HZ)) <= 0) {
>
> Given that I suggested this fix, it should be no surprise:
>
> Reviewed-by: Douglas Anderson <dianders@...omium.org>
>
> Upon sleeping on the idea, the only slight concern I have here is
> whether we'll trigger this timeout if we try to suspend while WiFi
> scanning is in progress. I don't have any actual evidence supporting
> this concern, but I remember many years ago when I used to deal with
> the WiFi drivers more often there were cases where suspend could be
> delayed if it happened while a scan was in progress. Maybe/hopefully
> that's fixed now, but I figured I'd at least bring it up in case it
> tickled anything in someone's mind...

Scans should essentially have been canceled by now, but IIUC, the
driver doesn't really force the card to stop if it's in the middle of
a scan (I'm not 100% sure if it's possible). So it's possible that
pending scans could delay this step.

I wonder what the distribution of (successful) timing is today. I'd
assume this typically take on the order of milliseconds, but do we
commonly see outliers? What if a system is fully loaded while
suspending? Can you try testing (and gather timing numbers) when
suspending soon after initiating scans? It's hard to judge what the
lower limit of this timeout should really be without any numbers, just
like it's hard to judge whether your 10 second watchdog is reasonable.

Also, for the record, since we might have to field regression reports
for other systems: what hardware (chip variant, FW version) are you
seeing problems on?

Brian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ