linux-kernel - Re: [PATCH V4 2/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <849e7b86-b971-47d7-8e31-7eee0918ea33@kernel.org>
Date: Tue, 2 Jul 2024 12:35:12 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Shakeel Butt <shakeel.butt@...ux.dev>, tj@...nel.org,
 cgroups@...r.kernel.org, hannes@...xchg.org, lizefan.x@...edance.com,
 longman@...hat.com, kernel-team@...udflare.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH V4 2/2] cgroup/rstat: Avoid thundering herd problem by
 kswapd across NUMA nodes

On 29/06/2024 00.15, Yosry Ahmed wrote:
> [..]
>>>> +    /* Obtained lock, record this cgrp as the ongoing flusher */
>>>> +    if (!READ_ONCE(cgrp_rstat_ongoing_flusher)) {
>>>
>>> Can the above condition will ever be false?
>>>
>>
>> Yes, I think so, because I realized that cgroup_rstat_flush_locked() can
>> release/"yield" the lock.  Thus, other CPUs/threads have a chance to
>> call cgroup_rstat_flush, and try to become the "ongoing-flusher".
> 
> Right, there may actually be multiple ongoing flushers. I am now
> wondering if it would be better if we drop cgrp_rstat_ongoing_flusher
> completely, add a per-cgroup under_flush boolean/flag, and have the
> cgroup iterate its parents here to check if any of them is under_flush
> and wait for it instead.
> 
> Yes, we have to add parent iteration here, but I think it may be fine
> because the flush path is already expensive. This will allow us to
> detect if any ongoing flush is overlapping with us, not just the one
> that happened to update cgrp_rstat_ongoing_flusher first.
> 
> WDYT?

No, I don't think we should complicate the code to "support" multiple
ongoing flushers (there is no parallel execution of these). The lock
yielding cause the (I assume) unintended side-effect that multiple
ongoing flushers can exist.  We should work towards only having a single
ongoing flusher.

With the current kswapd rstat contention issue, yielding the lock in the
loop, creates the worst possible case of cache-line trashing, as these
kthreads run on 12 different NUMA nodes.

I'm working towards changing rstat lock to a mutex.  When doing so, we
should not yield the lock in the loop.  This will guarantee only having
a single ongoing flusher, and reduce cache-line trashing.

--Jesper