[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wLuJNe9uPE3-fBNLdiCPBpKt4a1ytuf7-+oiS5rBrg_w@mail.gmail.com>
Date: Thu, 1 May 2025 10:49:50 +1200
From: Barry Song <21cnbao@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Qun-Wei Lin <qun-wei.lin@...iatek.com>, Mike Rapoport <rppt@...nel.org>,
Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, Nhat Pham <nphamcs@...il.com>,
Sergey Senozhatsky <senozhatsky@...omium.org>, Minchan Kim <minchan@...nel.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-mediatek@...ts.infradead.org, Casper Li <casper.li@...iatek.com>,
Chinwen Chang <chinwen.chang@...iatek.com>, Andrew Yang <andrew.yang@...iatek.com>,
James Hsu <james.hsu@...iatek.com>
Subject: Re: [PATCH] mm: Add Kcompressd for accelerated memory compression
On Thu, May 1, 2025 at 9:51 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Wed, 30 Apr 2025 16:26:41 +0800 Qun-Wei Lin <qun-wei.lin@...iatek.com> wrote:
>
> > This patch series introduces a new mechanism called kcompressd to
> > improve the efficiency of memory reclaiming in the operating system.
> >
> > Problem:
> > In the current system, the kswapd thread is responsible for both scanning
> > the LRU pages and handling memory compression tasks (such as those
> > involving ZSWAP/ZRAM, if enabled). This combined responsibility can lead
> > to significant performance bottlenecks, especially under high memory
> > pressure. The kswapd thread becomes a single point of contention, causing
> > delays in memory reclaiming and overall system performance degradation.
> >
> > Solution:
> > Introduced kcompressd to handle asynchronous compression during memory
> > reclaim, improving efficiency by offloading compression tasks from
> > kswapd. This allows kswapd to focus on its primary task of page reclaim
> > without being burdened by the additional overhead of compression.
> >
> > In our handheld devices, we found that applying this mechanism under high
> > memory pressure scenarios can increase the rate of pgsteal_anon per second
> > by over 260% compared to the situation with only kswapd. Additionally, we
> > observed a reduction of over 50% in page allocation stall occurrences,
> > further demonstrating the effectiveness of kcompressd in alleviating memory
> > pressure and improving system responsiveness.
>
> It's a significant change and I'm thinking that broader performance
> testing across a broader range of machines is needed before we can
> confidently upstream such a change.
We ran the same test on our phones and saw the same results as Qun-Wei.
The async compression significantly reduces allocation stalls and improves
reclamation speed. However, I agree that broader testing is needed, and
we’ll also need the zswap team’s help with testing zswap cases.
>
> Also, it's presumably a small net loss on single-CPU machines (do these
> exist any more?). Is it hard to disable this feature on such machines?
A net loss is possible, but kswapd can sometimes enter sleep contexts,
allowing the parallel kcompressd thread to continue compression.
This could actually be a win. But I agree that additional testing on
single-CPU machines may be necessary.
It could be disabled by the following if we discover any regression on
single-CPU machines?
if (num_online_cpus() == 1)
return false;
>
> >
> > +static bool swap_sched_async_compress(struct folio *folio)
> > +{
> > + struct swap_info_struct *sis = swp_swap_info(folio->swap);
> > + int nid = numa_node_id();
> > + pg_data_t *pgdat = NODE_DATA(nid);
> > +
> > + if (unlikely(!pgdat->kcompressd))
> > + return false;
> > +
> > + if (!current_is_kswapd())
> > + return false;
> > +
> > + if (!folio_test_anon(folio))
> > + return false;
>
> Are you sure the above three tests are really needed?
Currently, it runs as a per-node thread mainly to accelerate asynchronous
reclamation, which effectively reduces direct reclamation. Since direct
reclamation already follows the slow path, asynchronous compression offers
limited additional benefit in that context. Moreover, it's difficult
to determine
the optimal number of threads for direct reclamation, whereas the compression
in the current direct reclamation allows it to utilize all CPUs.
The first condition checks whether kcompressd is present. The second
ensures that we're in kswapd asynchronous reclamation, not direct
reclamation. The third condition might be optimized or dropped, at least for
swap-backed shmem, and similar cases.
Thanks
Barry
Powered by blists - more mailing lists