linux-kernel - Re: [PATCH] mm: Add Kcompressd for accelerated memory compression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bf1db02cc0e7682e8f6eea4d0d61f6f249536163.camel@mediatek.com>
Date: Fri, 2 May 2025 09:16:01 +0000
From: Qun-wei Lin (林群崴) <Qun-wei.Lin@...iatek.com>
To: "hannes@...xchg.org" <hannes@...xchg.org>
CC: Andrew Yang (楊智強) <Andrew.Yang@...iatek.com>,
	"rppt@...nel.org" <rppt@...nel.org>, "nphamcs@...il.com" <nphamcs@...il.com>,
	"21cnbao@...il.com" <21cnbao@...il.com>,
	James Hsu (徐慶薰) <James.Hsu@...iatek.com>,
	AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mediatek@...ts.infradead.org" <linux-mediatek@...ts.infradead.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Chinwen Chang (張錦文)
	<chinwen.chang@...iatek.com>, Casper Li (李中榮)
	<casper.li@...iatek.com>, "minchan@...nel.org" <minchan@...nel.org>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "matthias.bgg@...il.com"
	<matthias.bgg@...il.com>, "senozhatsky@...omium.org"
	<senozhatsky@...omium.org>
Subject: Re: [PATCH] mm: Add Kcompressd for accelerated memory compression

On Thu, 2025-05-01 at 10:02 -0400, Johannes Weiner wrote:


> External email : Please do not click links or open attachments until
you have verified the sender or the content.
> 
> 
> On Wed, Apr 30, 2025 at 04:26:41PM +0800, Qun-Wei Lin wrote:
> 
> > This patch series introduces a new mechanism called kcompressd to
> > improve the efficiency of memory reclaiming in the operating
system.
> > 
> > Problem:
> >   In the current system, the kswapd thread is responsible for both
scanning
> >   the LRU pages and handling memory compression tasks (such as
those
> >   involving ZSWAP/ZRAM, if enabled). This combined responsibility
can lead
> >   to significant performance bottlenecks, especially under high
memory
> >   pressure. The kswapd thread becomes a single point of contention,
causing
> >   delays in memory reclaiming and overall system performance
degradation.
> > 
> > Solution:
> >   Introduced kcompressd to handle asynchronous compression during
memory
> >   reclaim, improving efficiency by offloading compression tasks
from
> >   kswapd. This allows kswapd to focus on its primary task of page
reclaim
> >   without being burdened by the additional overhead of compression.
> > 
> > In our handheld devices, we found that applying this mechanism
under high
> > memory pressure scenarios can increase the rate of pgsteal_anon per
second
> > by over 260% compared to the situation with only kswapd.
Additionally, we
> > observed a reduction of over 50% in page allocation stall
occurrences,
> > further demonstrating the effectiveness of kcompressd in
alleviating memory
> > pressure and improving system responsiveness.
> 
> 
> Yes, I think parallelizing this work makes a lot of sense.
> 
> 
> > Co-developed-by: Barry Song
<[21cnbao@...il.com](mailto:21cnbao@...il.com)>
> > Signed-off-by: Barry Song
<[21cnbao@...il.com](mailto:21cnbao@...il.com)>
> > Signed-off-by: Qun-Wei Lin
<[qun-wei.lin@...iatek.com](mailto:qun-wei.lin@...iatek.com)>
> > Reference: Re: [PATCH 0/2] Improve Zram by separating compression
context from kswapd - Barry Song
> >           
[https://lore.kernel.org/lkml/20250313093005.13998-1-21cnbao@gmail.com/](https://lore.kernel.org/lkml/20250313093005.13998-1-21cnbao@gmail.com/)
> > ---
> >  include/linux/mmzone.h |  6 ++++
> >  mm/mm_init.c           |  1 +
> >  mm/page_io.c           | 71
++++++++++++++++++++++++++++++++++++++++++
> >  mm/swap.h              |  6 ++++
> >  mm/vmscan.c            | 25 +++++++++++++++
> >  5 files changed, 109 insertions(+)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 6ccec1bf2896..93c9195a54ae 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -23,6 +23,7 @@
> >  #include <linux/page-flags.h>
> >  #include <linux/local_lock.h>
> >  #include <linux/zswap.h>
> > +#include <linux/kfifo.h>
> >  #include <asm/page.h>
> > 
> >  /* Free memory management - zoned buddy allocator.  */
> > @@ -1398,6 +1399,11 @@ typedef struct pglist_data {
> > 
> >       int kswapd_failures;            /* Number of 'reclaimed == 0'
runs */
> > 
> > +#define KCOMPRESS_FIFO_SIZE 256
> > +     wait_queue_head_t kcompressd_wait;
> > +     struct task_struct *kcompressd;
> > +     struct kfifo kcompress_fifo;
> 
> 
> The way you implemented this adds time-and-space overhead even on
> systems that don't have any sort of swap compression enabled.
>


To address the overhead concern, perhaps we can embed only a single
kcompressd pointer within pglist_data and perform lazy initialization
only when a zram device is added or zswap is enabled.


> That seems unnecessary. There is an existing method for asynchronous
> writeback, and pageout() is naturally fully set up to handle this.
> 
> IMO the better way to do this is to make zswap_store() (and
> zram_bio_write()?) asynchronous. Make those functions queue the work
> and wake the compression daemon, and then have the daemon call
> folio_end_writeback() / bio_endio() when it's done with it.



Perhaps we could add an enqueue/wake-upkcompressd interface and call it
within zswap_store() and zram_bio_write(). This would leverage the
existing obj_cgroup_may_zswap() check in zswap_store(), it solved the
problem that zswap is re-compressed too soon. as mentioned by Nhat.

In outline:

1. Per-node pointer in pglist_data:  

   typedef struct pglist_data {  
   ...  
   struct kcompressd_node *kcompressd;  
   ...  
   }

2. Global register/unregister hooks:  

   kcompressd_register_backend(): Register a new backend (zram/zswap).
   Initialize the kcompressd structure and kfifo if this is the first 
   call.
   
   kcompressd_unregister_backend(): Unregister a backend (zram/zswap).
   Use a per-node refcount and bitmap to track how manyzswap/zram   
   instances are active. If the last backend is unregistered, free   
   the kcompressd resources.

> > A net loss is possible, but kswapd can sometimes enter sleep
> > contexts,
> > allowing the parallel kcompressd thread to continue compression.
> > This could actually be a win. But I agree that additional testing
on
> > single-CPU machines may be necessary.
> 
> It could be disabled by the following if we discover any regression
> on
> single-CPU machines?
> 
> if (num_online_cpus() == 1)
>      return false;
>

   We can add this check in the register/unregister function.

3. Enqueue API:  

   kcompressd_enqueue_folio(folio) /kcompressd_enqueue_bio(bio): Push a
   job to the kcompressd’s FIFO and wake up the kcompressd daemon.

With this approach, there is zero runtime cost on nodes when no backend
is active and only one allocation per node.


Thank you for your feedback!  
Please let me know what you think.

Best Regards,  
Qun-wei