netdev - Re: [PATCH v1 3/3] cgroup/rstat: introduce ratelimited rstat flushing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkbZAj3UQSHbu3kj1NG4QDowXWrohG4XM=7cX_a=QL-Shg@mail.gmail.com>
Date: Wed, 17 Apr 2024 19:21:33 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: tj@...nel.org, hannes@...xchg.org, lizefan.x@...edance.com, 
	cgroups@...r.kernel.org, longman@...hat.com, netdev@...r.kernel.org, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, shakeel.butt@...ux.dev, 
	kernel-team@...udflare.com, Arnaldo Carvalho de Melo <acme@...nel.org>, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, mhocko@...nel.org
Subject: Re: [PATCH v1 3/3] cgroup/rstat: introduce ratelimited rstat flushing

On Tue, Apr 16, 2024 at 10:51 AM Jesper Dangaard Brouer <hawk@...nel.org> wrote:
>
> This patch aims to reduce userspace-triggered pressure on the global
> cgroup_rstat_lock by introducing a mechanism to limit how often reading
> stat files causes cgroup rstat flushing.
>
> In the memory cgroup subsystem, memcg_vmstats_needs_flush() combined with
> mem_cgroup_flush_stats_ratelimited() already limits pressure on the
> global lock (cgroup_rstat_lock). As a result, reading memory-related stat
> files (such as memory.stat, memory.numa_stat, zswap.current) is already
> a less userspace-triggerable issue.
>
> However, other userspace users of cgroup_rstat_flush(), such as when
> reading io.stat (blk-cgroup.c) and cpu.stat, lack a similar system to
> limit pressure on the global lock. Furthermore, userspace can easily
> trigger this issue by reading those stat files.
>
> Typically, normal userspace stats tools (e.g., cadvisor, nomad, systemd)
> spawn threads that read io.stat, cpu.stat, and memory.stat (even from the
> same cgroup) without realizing that on the kernel side, they share the
> same global lock. This limitation also helps prevent malicious userspace
> applications from harming the kernel by reading these stat files in a
> tight loop.
>
> To address this, the patch introduces cgroup_rstat_flush_ratelimited(),
> similar to memcg's mem_cgroup_flush_stats_ratelimited().
>
> Flushing occurs per cgroup (even though the lock remains global) a
> variable named rstat_flush_last_time is introduced to track when a given
> cgroup was last flushed. This variable, which contains the jiffies of the
> flush, shares properties and a cache line with rstat_flush_next and is
> updated simultaneously.
>
> For cpu.stat, we need to acquire the lock (via cgroup_rstat_flush_hold)
> because other data is read under the lock, but we skip the expensive
> flushing if it occurred recently.
>
> Regarding io.stat, there is an opportunity outside the lock to skip the
> flush, but inside the lock, we must recheck to handle races.
>
> Signed-off-by: Jesper Dangaard Brouer <hawk@...nel.org>

As I mentioned in another thread, I really don't like time-based
rate-limiting [1]. Would it be possible to generalize the
magnitude-based rate-limiting instead? Have something like
memcg_vmstats_needs_flush() in the core rstat code?

Also, why do we keep the memcg time rate-limiting with this patch? Is
it because we use a much larger window there (2s)? Having two layers
of time-based rate-limiting is not ideal imo.

[1]https://lore.kernel.org/lkml/CAJD7tkYnSRwJTpXxSnGgo-i3-OdD7cdT-e3_S_yf7dSknPoRKw@mail.gmail.com/