[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <849e7b86-b971-47d7-8e31-7eee0918ea33@kernel.org>
Date: Tue, 2 Jul 2024 12:35:12 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Shakeel Butt <shakeel.butt@...ux.dev>, tj@...nel.org,
cgroups@...r.kernel.org, hannes@...xchg.org, lizefan.x@...edance.com,
longman@...hat.com, kernel-team@...udflare.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH V4 2/2] cgroup/rstat: Avoid thundering herd problem by
kswapd across NUMA nodes
On 29/06/2024 00.15, Yosry Ahmed wrote:
> [..]
>>>> + /* Obtained lock, record this cgrp as the ongoing flusher */
>>>> + if (!READ_ONCE(cgrp_rstat_ongoing_flusher)) {
>>>
>>> Can the above condition will ever be false?
>>>
>>
>> Yes, I think so, because I realized that cgroup_rstat_flush_locked() can
>> release/"yield" the lock. Thus, other CPUs/threads have a chance to
>> call cgroup_rstat_flush, and try to become the "ongoing-flusher".
>
> Right, there may actually be multiple ongoing flushers. I am now
> wondering if it would be better if we drop cgrp_rstat_ongoing_flusher
> completely, add a per-cgroup under_flush boolean/flag, and have the
> cgroup iterate its parents here to check if any of them is under_flush
> and wait for it instead.
>
> Yes, we have to add parent iteration here, but I think it may be fine
> because the flush path is already expensive. This will allow us to
> detect if any ongoing flush is overlapping with us, not just the one
> that happened to update cgrp_rstat_ongoing_flusher first.
>
> WDYT?
No, I don't think we should complicate the code to "support" multiple
ongoing flushers (there is no parallel execution of these). The lock
yielding cause the (I assume) unintended side-effect that multiple
ongoing flushers can exist. We should work towards only having a single
ongoing flusher.
With the current kswapd rstat contention issue, yielding the lock in the
loop, creates the worst possible case of cache-line trashing, as these
kthreads run on 12 different NUMA nodes.
I'm working towards changing rstat lock to a mutex. When doing so, we
should not yield the lock in the loop. This will guarantee only having
a single ongoing flusher, and reduce cache-line trashing.
--Jesper
Powered by blists - more mailing lists