[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXTbpe6D_pwLP6+0Kv-n7KMRoOn-_CWdZygMz0yK1fFuC-2Bg@mail.gmail.com>
Date: Fri, 6 Dec 2024 19:21:40 +0800
From: Pin-yen Lin <treapking@...omium.org>
To: Doug Anderson <dianders@...omium.org>
Cc: Brian Norris <briannorris@...omium.org>, Francesco Dolcini <francesco@...cini.it>,
Kalle Valo <kvalo@...nel.org>, David Lin <yu-hao.lin@....com>, linux-kernel@...r.kernel.org,
linux-wireless@...r.kernel.org
Subject: Re: [PATCH] wifi: mwifiex: decrease timeout waiting for host sleep
from 10s to 5s
Hi Doug,
On Fri, Dec 6, 2024 at 1:13 AM Doug Anderson <dianders@...omium.org> wrote:
>
> Hi,
>
> On Thu, Dec 5, 2024 at 5:46 AM Pin-yen Lin <treapking@...omium.org> wrote:
> >
> > Hi Doug,
> >
> > On Thu, Dec 5, 2024 at 2:11 AM Doug Anderson <dianders@...omium.org> wrote:
> > >
> > > Hi,
> > >
> > > On Wed, Dec 4, 2024 at 5:45 AM Pin-yen Lin <treapking@...omium.org> wrote:
> > > >
> > > > > > > Can you try testing (and gather timing numbers) when
> > > > > > > suspending soon after initiating scans? It's hard to judge what the
> > > > > > > lower limit of this timeout should really be without any numbers, just
> > > > > > > like it's hard to judge whether your 10 second watchdog is reasonable.
> > > > > >
> > > > > > Pin-yen: is this something you could gather?
> > > >
> > > > I tried entering suspend right after wifi scans, and the time spent in
> > > > mwifiex_enable_hs() is always around 100ms. It seems initiating
> > > > suspend does not increase the execution time for mwifiex_enable_hs(),
> > > > so I think the driver is capable of interrupting a scan.
> > > > > >
> > > > > >
> > > > > > > Also, for the record, since we might have to field regression reports
> > > > > > > for other systems: what hardware (chip variant, FW version) are you
> > > > > > > seeing problems on?
> > > > > >
> > > > > > Pin-yen: I'm assuming you'll provide this.
> > > >
> > > > From the debugfs entry:
> > > >
> > > > driver_name = "mwifiex"
> > > > driver_version = mwifiex 1.0 (15.68.19.p54)
> > > > verext = w8897o-B0, RF87XX, FP68, 15.68.19.p54
> > > >
> > > > The compatible string of the DT is "marvell,sd8897".
> > > > >
> > > > > I'll leave it up to y'all (Doug and Pin-Yen) whether you want to provide
> > > > > the above to provide a little more confidence, or if you want to
> > > > > reconsider your use of CONFIG_DPM_WATCHDOG_TIMEOUT.
> > > >
> > > > It looks okay to me to decrease the timeout here given that scanning
> > > > doesn't seem to affect the suspend time. What's your thought, Doug?
> > >
> > > I think Brian is right and that we should change how we're using the
> > > DPM watchdog, but IMO that doesn't mean that we couldn't also change
> > > this timeout.
> > >
> > > If nothing else, you'd want to post a v2 of your patch containing a
> > > summary of the info you've gathered so it's recorded with the patch.
> > >
> > > Maybe what makes the most sense, though, is to put our money where our
> > > mouth is and land some sort of patch in the ChromeOS tree and then
> > > report back to upstream in a month when we have data about it. If
> > > things look good in the field then presumably that would give some
> > > confidence for upstream to be willing to land the patch?
> > >
> > > Probably about the best data we could gather would be to break the
> > > existing timeout into two halves. You could wait half the time, then
> > > print a warning, and then wait the other half the time. We could even
> > > land that change _without_ changing the timeout to 5 seconds. Then if
> > > we ever see the warning print but then the suspend succeeds we know
> > > that there are cases where waiting longer would have helped. If we
> > > made that a WARN_ON() our existing infrastructure would help us gather
> > > that info...
> >
> > I just realized that mwifiex_wait_queue_complete() has another 12s
> > timeout[1], though they are not directly involved in suspend/resume.
> >
> > Should we also add a warning to that and see if we can make it half?
> > This starts to make me think how many timeouts we want to consider.
> > Does it make sense to only focus on the one in mwifiex_enable_hs()? Or
> > should we check other timeouts in mwifiex or even other drivers?
> >
> > [1]: https://elixir.bootlin.com/linux/v6.12.1/source/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c#L51
>
> IMO any of these "arbitrary and really long timeout to make sure we
> don't hang forever" type things probably should have a warning so we
> know if we're getting close to the timeout. It wouldn't be very hard
> to make a wrapper around wait_event_interruptible_timeout() to handle
> this. I suppose that wrapper could just be local to mwifiex, though if
> other drivers found it useful it would make sense to put it somewhere
> more common.
>
> That being said, if we aren't actually hitting these other timeouts I
> don't know that we need to do an extensive audit right now to find
> every one of them.
>
> Of course, Brian also said he'd be OK with just setting the timeout to
> 5 seconds based on the research you've already done. In that case I
> don't know if we'd want a WARNing after 2.5 seconds. Maybe? 2.5
> seconds is still pretty long, but I don't know if it's WARN-worthy. It
> could at least be dev_warn() worthy...
>
Sounds like we will start to binary search the timeout...
Then, I would like to just send out a v2 for this with an updated
commit message. It sounds like it will be endless binary searches on
all those timeouts in the kernel if we really want to create a wrapper
around wait_event_interruptible_timeout(), and all the efforts only
make us a few more seconds faster on the system.
>
>
> -Doug
Regards,
Pin-yen
Powered by blists - more mailing lists