[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aGKxvQdAZ-vSd48D@slm.duckdns.org>
Date: Mon, 30 Jun 2025 05:48:13 -1000
From: "tj@...nel.org" <tj@...nel.org>
To: "Wlodarczyk, Bertrand" <bertrand.wlodarczyk@...el.com>
Cc: Shakeel Butt <shakeel.butt@...ux.dev>,
"hannes@...xchg.org" <hannes@...xchg.org>,
"mkoutny@...e.com" <mkoutny@...e.com>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"inwardvessel@...il.com" <inwardvessel@...il.com>
Subject: Re: [PATCH v2] cgroup/rstat: change cgroup_base_stat to atomic
Hello,
On Mon, Jun 30, 2025 at 02:25:27PM +0000, Wlodarczyk, Bertrand wrote:
> > > Also the response to the tearing issue explained by JP is not satisfying.
> >
> > In other words, the claim is: "it's better to stall other cpus in
> > spinlock plus disable IRQ every time in order to serve outdated snapshot instead of providing user to the freshest statistics much, much faster".
> > In term of statistics, freshest data served fast to the user is, in my opinion, better behavior.
>
> > This is a false choice, I think. e.g. We can easily use seqlock to remove strict synchronization only from user side, right?
>
> Yes, that's second possibility to solve a problem.
> I choose atomics approach because, in my opinion, incremental statistics are somewhat natural use case for them.
They're good for individual counters but I'm not sure they're natural fit
for a group of stats. A series of atomic ops can be significantly more
expensive than locked updates and it also comes with problems like split
updates as discussed in this thread. I think most of resistance is from the
use of atomics. Can you please try a different approach?
> > I wouldn't be addressing this issue if there were no customers
> > affected by rstat latency in multi-container multi-cpu scenarios.
>
> > Out of curiosity, can you explain the case that you observed in more detail?
> > What were the customer doing?
>
> Single hierarchy, hundreds of the containers on one server, multiple independent owners.
> Some of them wants to have current stats available in their webgui.
> They are hammering the stats for their cgroups.
> Server experience inefficiencies, perf shows visible percentage of cpu cycles spent in cgroup_rstat_flush.
>
> I prepared benchmark which can be example of the issue faced by the customer:
> https://gist.github.com/bwlodarcz/21bbc24813bced8e6ffc9e5ca3150fcc
>
> qemu vm:
> +---------+---------+
> mean (s) |8dcb0ed8 | patched |
> +--------------+---------+---------+
> |cpu, KCSAN on |16.13* |3.75 |
> +--------------+---------+---------+
> |cpu, KCSAN off|4.45 |0.81 |
> +--------------+---------+---------+
> *race condition still present
>
> It's not hammering the lock so much as previous stressor, so the results are better for for-6.17 branch.
> The customer has much bigger scale than 4 cgroups in benchmark.
> There are workarounds implemented so it's not that hot now (for them).
> Anyway, I think it's worth to try improving the scalability situation,
> especially that as far as I see it, there are no downsides.
>
> There also reports about similar problems in memory rstats but I didn't look on them yet.
Yeah, I saw the benchmark but I was more curious what actual use case would
lead to behaviors like that because you'd have to hammer on those stats
really hard for this to be a problem. In most use cases that I'm aware of,
the polling frequencies of these stats are >= 1sec. I guess the users in
your use case were banging on them way harder, at least previously.
I don't think switching to atomics is a good idea, but improving the read
scalability would definitely be nice.
Thanks.
--
tejun
Powered by blists - more mailing lists