[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZFvpJb9Dh0FCkLQA@google.com>
Date: Wed, 10 May 2023 11:57:41 -0700
From: Brian Norris <briannorris@...omium.org>
To: Tejun Heo <tj@...nel.org>
Cc: jiangshanlai@...il.com, linux-kernel@...r.kernel.org,
kernel-team@...a.com, Amitkumar Karwar <amitkarwar@...il.com>,
Ganapathi Bhat <ganapathi017@...il.com>,
Sharvari Harisangam <sharvari.harisangam@....com>,
Xinming Hu <huxinming820@...il.com>,
Kalle Valo <kvalo@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
Pin-yen Lin <treapking@...omium.org>
Subject: Re: [PATCH 02/13] wifi: mwifiex: Use default @max_active for
workqueues
Hi,
On Wed, May 10, 2023 at 08:16:00AM -1000, Tejun Heo wrote:
> > While I'm here: we're still debugging what's affecting WiFi performance
> > on some of our WiFi systems, but it's possible I'll be turning some of
> > these into struct kthread_worker instead. We can cross that bridge
> > (including potential conflicts) if/when we come to it though.
>
> Can you elaborate the performance problem you're seeing? I'm working on a
> major update for workqueue to improve its locality behavior, so if you're
> experiencing issues on CPUs w/ multiple L3 caches, it'd be a good test case.
Sure!
Test case: iperf TCP RX (i.e., hits "MWIFIEX_RX_WORK_QUEUE" a lot) at
some of the higher (VHT 80 MHz) data rates.
Hardware: Mediatek MT8173 2xA53 (little) + 2xA72 (big) CPU
(I'm not familiar with its cache details)
+
Marvell SD8897 SDIO WiFi (mwifiex_sdio)
We're looking at a major regression from our 4.19 kernel to a 5.15
kernel (yeah, that's downstream reality). So far, we've found that
performance is:
(1) much better (nearly the same as 4.19) if we add WQ_SYSFS and pin the
work queue to one CPU (doesn't really matter which CPU, as long as it's
not the one loaded with IRQ(?) work)
(2) moderately better if we pin the CPU frequency (e.g., "performance"
cpufreq governor instead of "schedutil")
(3) moderately better (not quite as good as (2)) if we switch a
kthread_worker and don't pin anything.
We tried (2) because we saw a lot more CPU migration on kernel 5.15
(work moves across all 4 CPUs throughout the run; on kernel 4.19 it
mostly switched between 2 CPUs).
We tried (3) suspecting some kind of EAS issue (instead of distributing
our workload onto 4 different kworkers, our work (and therefore our load
calculation) is mostly confined to a single kernel thread). But it still
seems like our issues are more than "just" EAS / cpufreq issues, since
(2) and (3) aren't as good as (1).
NB: there weren't many relevant mwifiex or MTK-SDIO changes in this
range.
So we're still investigating a few other areas, but it does seem like
"locality" (in some sense of the word) is relevant. We'd probably be
open to testing any patches you have, although it's likely we'd have the
easiest time if we can port those to 5.15. We're constantly working on
getting good upstream support for Chromebook chips, but ARM SoC reality
is that it still varies a lot as to how much works upstream on any given
system.
Thanks,
Brian
Powered by blists - more mailing lists