lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 2 Jul 2024 12:35:12 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Shakeel Butt <shakeel.butt@...ux.dev>, tj@...nel.org,
 cgroups@...r.kernel.org, hannes@...xchg.org, lizefan.x@...edance.com,
 longman@...hat.com, kernel-team@...udflare.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH V4 2/2] cgroup/rstat: Avoid thundering herd problem by
 kswapd across NUMA nodes



On 29/06/2024 00.15, Yosry Ahmed wrote:
> [..]
>>>> +    /* Obtained lock, record this cgrp as the ongoing flusher */
>>>> +    if (!READ_ONCE(cgrp_rstat_ongoing_flusher)) {
>>>
>>> Can the above condition will ever be false?
>>>
>>
>> Yes, I think so, because I realized that cgroup_rstat_flush_locked() can
>> release/"yield" the lock.  Thus, other CPUs/threads have a chance to
>> call cgroup_rstat_flush, and try to become the "ongoing-flusher".
> 
> Right, there may actually be multiple ongoing flushers. I am now
> wondering if it would be better if we drop cgrp_rstat_ongoing_flusher
> completely, add a per-cgroup under_flush boolean/flag, and have the
> cgroup iterate its parents here to check if any of them is under_flush
> and wait for it instead.
> 
> Yes, we have to add parent iteration here, but I think it may be fine
> because the flush path is already expensive. This will allow us to
> detect if any ongoing flush is overlapping with us, not just the one
> that happened to update cgrp_rstat_ongoing_flusher first.
> 
> WDYT?

No, I don't think we should complicate the code to "support" multiple
ongoing flushers (there is no parallel execution of these). The lock
yielding cause the (I assume) unintended side-effect that multiple
ongoing flushers can exist.  We should work towards only having a single
ongoing flusher.

With the current kswapd rstat contention issue, yielding the lock in the
loop, creates the worst possible case of cache-line trashing, as these
kthreads run on 12 different NUMA nodes.

I'm working towards changing rstat lock to a mutex.  When doing so, we
should not yield the lock in the loop.  This will guarantee only having
a single ongoing flusher, and reduce cache-line trashing.

--Jesper

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ