lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkYV3iwk-ZJcr_==V4e24yH-1NaCYFUL7wDaQEi8ZXqfqQ@mail.gmail.com>
Date: Tue, 16 Jul 2024 17:35:05 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: tj@...nel.org, cgroups@...r.kernel.org, shakeel.butt@...ux.dev, 
	hannes@...xchg.org, lizefan.x@...edance.com, longman@...hat.com, 
	kernel-team@...udflare.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by
 kswapd across NUMA nodes

[..]
>
>
> This is a clean (meaning no cadvisor interference) example of kswapd
> starting simultaniously on many NUMA nodes, that in 27 out of 98 cases
> hit the race (which is handled in V6 and V7).
>
> The BPF "cnt" maps are getting cleared every second, so this
> approximates per sec numbers.  This patch reduce pressure on the lock,
> but we are still seeing (kfunc:vmlinux:cgroup_rstat_flush_locked) full
> flushes approx 37 per sec (every 27 ms). On the positive side
> ongoing_flusher mitigation stopped 98 per sec of these.
>
> In this clean kswapd case the patch removes the lock contention issue
> for kswapd. The lock_contended cases 27 seems to be all related to
> handled_race cases 27.
>
> The remaning high flush rate should also be addressed, and we should
> also work on aproaches to limit this like my ealier proposal[1].

I honestly don't think a high number of flushes is a problem on its
own as long as we are not spending too much time flushing, especially
when we have magnitude-based thresholding so we know there is
something to flush (although it may not be relevant to what we are
doing).

If we keep observing a lot of lock contention, one thing that I
thought about is to have a variant of spin_lock with a timeout. This
limits the flushing latency, instead of limiting the number of flushes
(which I believe is the wrong metric to optimize).

It also seems to me that we are doing a flush each 27ms, and your
proposed threshold was once per 50ms. It doesn't seem like a
fundamental difference.

I am also wondering how many more flushes could be skipped if we
handle the case of multiple ongoing flushers (whether by using a
mutex, or making it a per-cgroup property as I suggested earlier).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ