netdev - Re: Advice on cgroup rstat lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAJD7tkYnSRwJTpXxSnGgo-i3-OdD7cdT-e3_S_yf7dSknPoRKw@mail.gmail.com>
Date: Wed, 17 Apr 2024 19:04:50 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Jesper Dangaard Brouer <hawk@...nel.org>, Waiman Long <longman@...hat.com>, 
	Johannes Weiner <hannes@...xchg.org>, Tejun Heo <tj@...nel.org>, 
	Jesper Dangaard Brouer <jesper@...udflare.com>, "David S. Miller" <davem@...emloft.net>, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Shakeel Butt <shakeelb@...gle.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Daniel Bristot de Oliveira <bristot@...hat.com>, 
	kernel-team <kernel-team@...udflare.com>, cgroups@...r.kernel.org, 
	Linux-MM <linux-mm@...ck.org>, Netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, Ivan Babrou <ivan@...udflare.com>
Subject: Re: Advice on cgroup rstat lock

[..]

> > > I personally don't like mem_cgroup_flush_stats_ratelimited() very
> > > much, because it is time-based (unlike memcg_vmstats_needs_flush()),
> > > and a lot of changes can happen in a very short amount of time.
> > > However, it seems like for some workloads it's a necessary evil :/
> > >
>
> Other than obj_cgroup_may_zswap(), there is no other place which really
> need very very accurate stats. IMO we should actually make ratelimited
> version the default one for all the places. Stats will always be out of
> sync for some time window even with non-ratelimited flush and I don't
> see any place where 2 second old stat would be any issue.

We disagreed about this before, and I am not trying to get you to
debate this with me again :)

I just prefer that we avoid this if possible. We have seen cases where
the 2 sec window caused issues. Not because 2 sec is a long time, but
because userspace reads the stats after an event occurs (e.g.
proactive reclaim), but gets stats from before the event.

[..]
>
> >
> >
> > With a mutex lock contention will be less obvious, as converting this to
> > a mutex avoids multiple CPUs spinning while waiting for the lock, but
> > it doesn't remove the lock contention.
> >
>
> I don't like global sleepable locks as those are source of priority
> inversion issues on highly utilized multi-tenant systems but I still
> need to see how you are handling that.

For context, this was discussed before as well in [1].

[1]https://lore.kernel.org/lkml/CALvZod441xBoXzhqLWTZ+xnqDOFkHmvrzspr9NAr+nybqXgS-A@mail.gmail.com/