lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CH3PR11MB7894190E3C849233A444E5CCF172A@CH3PR11MB7894.namprd11.prod.outlook.com>
Date: Wed, 18 Jun 2025 14:31:53 +0000
From: "Wlodarczyk, Bertrand" <bertrand.wlodarczyk@...el.com>
To: Michal Koutný <mkoutny@...e.com>
CC: "tj@...nel.org" <tj@...nel.org>, "hannes@...xchg.org"
	<hannes@...xchg.org>, "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, JP Kobryn
	<inwardvessel@...il.com>
Subject: RE: [PATCH] cgroup/rstat: change cgroup_base_stat to atomic

Thank you for your time and answer. 

> The kernel currently faces scalability issues when multiple userspace 
> programs attempt to read cgroup statistics concurrently.

> Does "currently" mean that you didn't observe this before per-subsys split?

That means the current e04c78d86a9699d1 (Linux 6.16-rc2). It was observed by our customer around winter this year and it's still present.
I believe it's present since the lock exists. 

> The primary bottleneck is the css_cgroup_lock in cgroup_rstat_flush, 
> which prevents access and updates to the statistics of the css from 
> multiple CPUs in parallel.

> I think this description needs some refresh on top of the current mainline (at least after the commit 748922dcfabdd ("cgroup: use subsystem-specific rstat locks to avoid contention") to be clear which lock (and locking functions) is apparently contentious.

The main culprit is css_cgroup_lock in cgroup_rstat_flush. It's locking css although the main algo operates mainly on per cpu data. 
Only propagation to parent needs to be locked but only if the data isn't atomic.
The benchmark results were gathered after the patch 748922dcfabdd on top of the commit e04c78d86a9699d1 (Linux 6.16-rc2).

> Notably, performance for memory and I/O rstats remains unchanged, as 
> these are managed in separate submodules.
 

> Additionally, this patch addresses a race condition detectable in the 
> current mainline by KCSAN in __cgroup_account_cputime, which occurs 
> when attempting to read a single hierarchy from multiple CPUs.

> Could you please extract this fix and send it separately?

Unfortunately, I don't have it. My primary objective was to resolve performance bottleneck during rstat access for customer. I found the race condition by accident after my benchmark (provided in gist) run with KCSAN. Didn't investigated race alone.
 
Thanks,
Bertrand

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ