[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAPL-u9Sp_uKLsvjbaKuKnVoMUFPfE=kKf2k6PNOgA8QmdgnHw@mail.gmail.com>
Date: Mon, 14 Oct 2024 16:41:03 -0700
From: Wei Xu <weixugc@...gle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Yu Zhao <yuzhao@...gle.com>, Axel Rasmussen <axelrasmussen@...gle.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm/mglru: only clear kswapd_failures if reclaimable
On Mon, Oct 14, 2024 at 4:25 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Mon, 14 Oct 2024 22:12:11 +0000 Wei Xu <weixugc@...gle.com> wrote:
>
> > lru_gen_shrink_node() unconditionally clears kswapd_failures, which
> > can prevent kswapd from sleeping and cause 100% kswapd cpu usage even
> > when kswapd repeatedly fails to make progress in reclaim.
> >
> > Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes
> > some progress, similar to shrink_node().
>
> That sounds bad. What triggers this? Can you suggest why it has just
> bee discovered, after 1.5 years? And should the fix be backported into
> -stable kernels?
>
I happened to run into this problem in one of my tests recently. It
requires a combination of several conditions: The allocator needs to
allocate a right amount of pages such that it can wake up kswapd
without itself being OOM killed; there is no memory for kswapd to
reclaim (My test disables swap and cleans page cache first); no other
process frees enough memory at the same time.
I think the fix is a good candidate for stable kernels.
Powered by blists - more mailing lists