lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZFvuOK_dpGTE4UVS@slm.duckdns.org>
Date: Wed, 10 May 2023 09:19:20 -1000
From: Tejun Heo <tj@...nel.org>
To: Brian Norris <briannorris@...omium.org>
Cc: jiangshanlai@...il.com, linux-kernel@...r.kernel.org,
	kernel-team@...a.com, Amitkumar Karwar <amitkarwar@...il.com>,
	Ganapathi Bhat <ganapathi017@...il.com>,
	Sharvari Harisangam <sharvari.harisangam@....com>,
	Xinming Hu <huxinming820@...il.com>, Kalle Valo <kvalo@...nel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
	Pin-yen Lin <treapking@...omium.org>
Subject: Re: [PATCH 02/13] wifi: mwifiex: Use default @max_active for
 workqueues

Hello,

On Wed, May 10, 2023 at 11:57:41AM -0700, Brian Norris wrote:
> Test case: iperf TCP RX (i.e., hits "MWIFIEX_RX_WORK_QUEUE" a lot) at
> some of the higher (VHT 80 MHz) data rates.
> 
> Hardware: Mediatek MT8173 2xA53 (little) + 2xA72 (big) CPU
> (I'm not familiar with its cache details)
> +
> Marvell SD8897 SDIO WiFi (mwifiex_sdio)

Yeah, we had multiple of similar cases on, what I think are, similar
configurations, which is why I'm working on improving workqueue locality.

> We're looking at a major regression from our 4.19 kernel to a 5.15
> kernel (yeah, that's downstream reality). So far, we've found that
> performance is:

That's curious. 4.19 is old but I scanned the history and there's nothing
which can cause that kind of perf regression for unbound workqueues between
4.19 and 5.15.

> (1) much better (nearly the same as 4.19) if we add WQ_SYSFS and pin the
> work queue to one CPU (doesn't really matter which CPU, as long as it's
> not the one loaded with IRQ(?) work)
> 
> (2) moderately better if we pin the CPU frequency (e.g., "performance"
> cpufreq governor instead of "schedutil")
> 
> (3) moderately better (not quite as good as (2)) if we switch a
> kthread_worker and don't pin anything.

Hmm... so it's not just workqueue.

> We tried (2) because we saw a lot more CPU migration on kernel 5.15
> (work moves across all 4 CPUs throughout the run; on kernel 4.19 it
> mostly switched between 2 CPUs).

Workqueue can contribute to this but it seems more likely that scheduling
changes are also part of the story.

> We tried (3) suspecting some kind of EAS issue (instead of distributing
> our workload onto 4 different kworkers, our work (and therefore our load
> calculation) is mostly confined to a single kernel thread). But it still
> seems like our issues are more than "just" EAS / cpufreq issues, since
> (2) and (3) aren't as good as (1).
> 
> NB: there weren't many relevant mwifiex or MTK-SDIO changes in this
> range.
> 
> So we're still investigating a few other areas, but it does seem like
> "locality" (in some sense of the word) is relevant. We'd probably be
> open to testing any patches you have, although it's likely we'd have the
> easiest time if we can port those to 5.15. We're constantly working on
> getting good upstream support for Chromebook chips, but ARM SoC reality
> is that it still varies a lot as to how much works upstream on any given
> system.

I should be able to post the patchset later today or tomorrow. It comes with
sysfs knobs to control affinity scopes and strictness, so hopefully you
should be able to find the configuration that works without too much
difficulty.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ