[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5f055416-9c49-42c0-9ba0-e45f6aaeac04@gmail.com>
Date: Tue, 17 Jun 2025 14:05:37 -0700
From: JP Kobryn <inwardvessel@...il.com>
To: Bertrand Wlodarczyk <bertrand.wlodarczyk@...el.com>, tj@...nel.org,
hannes@...xchg.org, mkoutny@...e.com, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cgroup/rstat: change cgroup_base_stat to atomic
On 6/17/25 3:26 AM, Bertrand Wlodarczyk wrote:
> The kernel currently faces scalability issues when multiple userspace
> programs attempt to read cgroup statistics concurrently.
>
> The primary bottleneck is the css_cgroup_lock in cgroup_rstat_flush,
> which prevents access and updates to the statistics
> of the css from multiple CPUs in parallel.
>
> Given that rstat operates on a per-CPU basis and only aggregates
> statistics in the parent cgroup, there is no compelling reason
> why these statistics cannot be atomic.
Have you considered the "tearing" that will occur when writes and reads
are happening in parallel?
The existing state is more of a snapshot approach. Changing the fields
involved to atomic and lockless reading/writing can result in
inconsistent values, i.e. fieldA might be more current than fieldB.
> By eliminating the lock, each CPU can traverse its rstat hierarchy
> independently, without blocking. Synchronization is achieved during
> parent propagation through atomic operations.
Even if the tearing scenario mentioned above is acceptable, removing
the lock will break synchronization of flushing non-base stat
subsystems.
>
> This change significantly enhances performance in scenarios
> where multiple CPUs access CPU rstat within a single cgroup hierarchy,
> yielding a performance improvement of around 50 times compared
> to the mainline version.
> Notably, performance for memory and I/O rstats remains unchanged,
> as these are managed in separate submodules.
>
> Additionally, this patch addresses a race condition detectable
> in the current mainline by KCSAN in __cgroup_account_cputime,
> which occurs when attempting to read a single hierarchy
> from multiple CPUs.
>
> Signed-off-by: Bertrand Wlodarczyk <bertrand.wlodarczyk@...el.com>
> ---
> Benchmark code: https://gist.github.com/bwlodarcz/c955b36b5667f0167dffcff23953d1da
>
> Tested on Intel(R) Xeon(R) Platinum 8468V, 2s 48c 2tpc, 377GiB RAM, Fedora 41:
> +--------+-------+
> |Mainline|Patched|
> +--------+-------+
> |369.95s |6.52s |
> +--------+-------+
>
[..]
> @@ -820,7 +813,6 @@ struct cgroup_subsys {
> */
> unsigned int depends_on;
>
> - spinlock_t rstat_ss_lock;
This lock is not used with base stats. The base stats are not a formal
subsystem.
[..]
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index cbeaa499a96a..36af2b883440 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -9,7 +9,6 @@
>
> #include <trace/events/cgroup.h>
>
> -static DEFINE_SPINLOCK(rstat_base_lock);
> static DEFINE_PER_CPU(raw_spinlock_t, rstat_base_cpu_lock);
>
> static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu);
> @@ -37,14 +36,6 @@ static struct cgroup_rstat_base_cpu *cgroup_rstat_base_cpu(
> return per_cpu_ptr(cgrp->rstat_base_cpu, cpu);
> }
>
> -static spinlock_t *ss_rstat_lock(struct cgroup_subsys *ss)
> -{
> - if (ss)
> - return &ss->rstat_ss_lock;
This was needed for non-base stat subsystems like memory and io.
> -
> - return &rstat_base_lock;
> -}
Powered by blists - more mailing lists