[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <z6twrrv53dzqfpytydpdgjg23dvbxyod6zi2fyeebzduyf4pzi@jqyatn34wi2a>
Date: Wed, 18 Jun 2025 10:17:09 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: "Wlodarczyk, Bertrand" <bertrand.wlodarczyk@...el.com>
Cc: JP Kobryn <inwardvessel@...il.com>, "tj@...nel.org" <tj@...nel.org>,
"hannes@...xchg.org" <hannes@...xchg.org>, "mkoutny@...e.com" <mkoutny@...e.com>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] cgroup/rstat: change cgroup_base_stat to atomic
On Wed, Jun 18, 2025 at 04:05:45PM +0000, Wlodarczyk, Bertrand wrote:
> Thank you for your time and review.
>
> > The kernel currently faces scalability issues when multiple userspace
> > programs attempt to read cgroup statistics concurrently.
> >
> > The primary bottleneck is the css_cgroup_lock in cgroup_rstat_flush,
> > which prevents access and updates to the statistics of the css from
> > multiple CPUs in parallel.
> >
> > Given that rstat operates on a per-CPU basis and only aggregates
> > statistics in the parent cgroup, there is no compelling reason why
> > these statistics cannot be atomic.
>
> > Have you considered the "tearing" that will occur when writes and reads are happening in parallel?
> > The existing state is more of a snapshot approach. Changing the fields involved to atomic and lockless reading/writing can result in inconsistent values, i.e. fieldA might be more current than fieldB.
>
> Yes, I considered it. In our scenario, "tearing" means that we have the newer data then the last "snapshot" and the output doesn't sum up in
> sum_exec_runtime. The "snapshot" suggests that these stats not need to be super precise because we're accepting outdated state.
> I'm not considering "tearing" issue as very bad here.
> Additionally, I'm sure that the "tearing" can happen but I didn't observe them during the benchmark runs.
> That suggests that's a rare occurrence or I didn't trigger the right condition to expose the issue.
>
> When multiple cpus tries to access the rstat the slowdown caused by css_base_lock is so massive that atomic change is preferable.
> It's better to get even "teared" result than spinlock cpus and disable irq for such long time.
>
>
> > By eliminating the lock, each CPU can traverse its rstat hierarchy
> > independently, without blocking. Synchronization is achieved during
> > parent propagation through atomic operations.
>
> > Even if the tearing scenario mentioned above is acceptable, removing the lock will break synchronization of flushing non-base stat subsystems.
>
> > -static spinlock_t *ss_rstat_lock(struct cgroup_subsys *ss) -{
> > - if (ss)
> > - return &ss->rstat_ss_lock;
>
> > This was needed for non-base stat subsystems like memory and io.
>
> From what I could find in code the flush with css and cpu arguments is implemented only in two subsystems: memory and io.
> Both memory and io have its own locks for them.
> I tested the benchmark (provided in gist) with memory.stat and io.stat hierarchies.
> In both cases the KCSAN doesn't have any issues, and performance is unchanged in comparison to the commit
> e04c78d86a9699d1 (Linux 6.16-rc2).
> For cpu stats the performance is much better after patch.
>
> I can't find a scenario when lack of this locks triggering some kind of issues.
> Maybe you can help me with that?
Please look at *_local and *_pending fields in
mem_cgroup_css_rstat_flush().
Powered by blists - more mailing lists