lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CH3PR11MB7894D2570AEA9AB67DBF706DF146A@CH3PR11MB7894.namprd11.prod.outlook.com>
Date: Mon, 30 Jun 2025 14:25:27 +0000
From: "Wlodarczyk, Bertrand" <bertrand.wlodarczyk@...el.com>
To: "tj@...nel.org" <tj@...nel.org>
CC: Shakeel Butt <shakeel.butt@...ux.dev>, "hannes@...xchg.org"
	<hannes@...xchg.org>, "mkoutny@...e.com" <mkoutny@...e.com>,
	"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"inwardvessel@...il.com" <inwardvessel@...il.com>
Subject: RE: [PATCH v2] cgroup/rstat: change cgroup_base_stat to atomic



>  > Also the response to the tearing issue explained by JP is not satisfying.
> 
> In other words, the claim is: "it's better to stall other cpus in 
> spinlock plus disable IRQ every time in order to serve outdated snapshot instead of providing user to the freshest statistics much, much faster".
> In term of statistics, freshest data served fast to the user is, in my opinion, better behavior.

> This is a false choice, I think. e.g. We can easily use seqlock to remove strict synchronization only from user side, right?

Yes, that's second possibility to solve a problem.
I choose atomics approach because, in my opinion, incremental statistics are somewhat natural use case for them.

> I wouldn't be addressing this issue if there were no customers 
> affected by rstat latency in multi-container multi-cpu scenarios.

> Out of curiosity, can you explain the case that you observed in more detail?
> What were the customer doing?

Single hierarchy, hundreds of the containers on one server, multiple independent owners.
Some of them wants to have current stats available in their webgui.
They are hammering the stats for their cgroups. 
Server experience inefficiencies, perf shows visible percentage of cpu cycles spent in cgroup_rstat_flush.

I prepared benchmark which can be example of the issue faced by the customer:
https://gist.github.com/bwlodarcz/21bbc24813bced8e6ffc9e5ca3150fcc

qemu vm:
               +---------+---------+
     mean (s)  |8dcb0ed8 | patched |
+--------------+---------+---------+
|cpu, KCSAN on |16.13*   |3.75     |
+--------------+---------+---------+
|cpu, KCSAN off|4.45     |0.81     |
+--------------+---------+---------+
*race condition still present

It's not hammering the lock so much as previous stressor, so the results are better for for-6.17 branch.
The customer has much bigger scale than 4 cgroups in benchmark. 
There are workarounds implemented so it's not that hot now (for them).
Anyway, I think it's worth to try improving the scalability situation, 
especially that as far as I see it, there are no downsides.
 
There also reports about similar problems in memory rstats but I didn't look on them yet. 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ