linux-kernel - Re: [PATCH] mm: Add Kcompressd for accelerated memory compression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250623051642.3645-1-21cnbao@gmail.com>
Date: Mon, 23 Jun 2025 17:16:42 +1200
From: Barry Song <21cnbao@...il.com>
To: nphamcs@...il.com
Cc: 21cnbao@...il.com,
	akpm@...ux-foundation.org,
	andrew.yang@...iatek.com,
	angelogioacchino.delregno@...labora.com,
	casper.li@...iatek.com,
	chinwen.chang@...iatek.com,
	hannes@...xchg.org,
	james.hsu@...iatek.com,
	linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org,
	linux-mediatek@...ts.infradead.org,
	linux-mm@...ck.org,
	matthias.bgg@...il.com,
	minchan@...nel.org,
	qun-wei.lin@...iatek.com,
	rppt@...nel.org,
	senozhatsky@...omium.org,
	sj@...nel.org
Subject: Re: [PATCH] mm: Add Kcompressd for accelerated memory compression

Hi Nhat,

On Wed, Jun 18, 2025 at 2:21 AM Nhat Pham <nphamcs@...il.com> wrote:
>
> On Sun, Jun 15, 2025 at 8:41 PM Barry Song <21cnbao@...il.com> wrote:
> > >>
> > >> That seems unnecessary. There is an existing method for asynchronous
> > >> writeback, and pageout() is naturally fully set up to handle this.
> > >>
> > >> IMO the better way to do this is to make zswap_store() (and
> > >> zram_bio_write()?) asynchronous. Make those functions queue the work
> > >> and wake the compression daemon, and then have the daemon call
> > >> folio_end_writeback() / bio_endio() when it's done with it.
> >
> > > +1.
> >
> >
> > But,
> > How could this be possible for zswap? zswap_store() is only a frontend —
> > we still need its return value to determine whether __swap_writepage()
> > is required. Waiting for the result of zswap_store() is inherently a
> > synchronous step.
>
> Hmm, I might be misunderstanding either of you, but it sounds like
> what you're describing here does not contradict what Johannes is
> proposing?

It seems contradictory: Johannes proposes that zswap could behave like zRAM
by invoking `folio_end_writeback()` or `bio_endio()`, but this doesn’t align
with actual behavior since zswap_store might not end `swap_writeout()`—it may
still proceed to `__swap_writeback()` to complete the final steps.

Meanwhile, Qun-wei’s RFC has already explored using `folio_end_writeback()` and
`bio_endio()` at the end of `__swap_writepage()` for zRAM, though that approach
also has its own issues.

>
> >
> > My point is that folio_end_writeback() and bio_endio() can only be
> > called after the entire zswap_store() → __swap_writepage() sequence is
> > completed. That’s why both are placed in the new kcompressed.
>
> Hmm, how about:
>
> 1. Inside zswap_store(), we first obtain the obj_cgroup reference,
> check cgroup and pool limit, and grab a zswap pool reference (in
> effect, determining the slot allocator and compressor).
>
> 2. Next, we try to queue the work to kcompressd, saving the folio and
> the zswap pool (and whatever else we need for the continuation). If
> this fails, we can proceed with the old synchronous path.
>
> 3. In kcompressed daemon, we perform the continuation of
> zswap_store(): compression, slot allocation, storing, zswap's LRU
> modification, etc. If this fails, we check if the mem_cgroup enables
> writeback. If it's enabled, we can call __swap_writepage(). Ideally,
> if writeback is disabled, we should activate the page, but it might
> not be possible since shrink_folio_list() might already re-add the
> page to the inactive lru. Maybe some modification of pageout() and
> shrink_folio_list() can make this work, but I haven't thought too
> deeply about it :) If it's impossible, we can perform async
> compression only for cgroups that enable writeback for now. Once we
> fix zswap's handling of incompressible pages, we can revisit this
> decision (+ SJ).
>
> TLDR: move the work-queueing step forward a bit, into the middle of
> zswap_store().
>
> One benefit of this is we skip pages of cgroups that disable zswap, or
> when zswap pool is full.

I assume you meant something like the following:

bool try_to_sched_async_zswap_store()
{
	get_obj_cgroup_from_folio()
	if (err) goto xxx;
	zswap_check_limits();
	if (err) goto xxx;
	zswap_pool_current_get()
	if (err) goto xxx;
	   
	queue_folio_to_kcompressd(folio);
	return true;

xxx:
	error handler things;
	return false;
}

If this function returns true, it suggests that compression requests  
have been queued to kcompressd. Following that, in kcompressd():

int __zswap_store(folio)
{
	for(i=0;i<nr_pages;i++) {
		zswap_store_page();
		if (err) return err;
	}
	return 0;
}

kcompressd()
{
	while(folio_queue_is_not_empty) {
		folio = dequeue_folio();
		if (folio_queued_by_zswap(folio)) {
			if(!__zswap_store(folio))
				continue;
		}
		if ((zswap_store_page_fails && mem_cgroup_zswap_writeback_enabled()) ||
		    folio_queued_by_zram) {
			__swap_writepage();
		}
	}
}

In kswapd, we will need to do
int swap_writeout(struct folio *folio, struct swap_iocb **swap_plug)
{
	...
	if (try_to_sched_async_zswap_store(folio))
		return;
	if (is_sync_comp_blkdev(swap)) {
		queue_folio_to_kcompressd(folio);
		return;
	}
	__swap_writepage();
}

To be honest, I'm not sure if there's a flag that indicates whether the  
folio was queued by zswap or zram. If not, we may need to add a member  
associated with folio pointers in the queue between kswapd and kcompressd,  
since we need to identify zswap cases. Maybe we can reuse bit 0 of the  
folio pointer?

What I mean is: while queuing, if the folio is queued by zswap, we do  
`pointer |= BIT(0)`. Then in kcompressd, we restore the original folio  
with `folio = pointer & ~BIT(0)`. It's a bit ugly, but I’m not sure  
there’s a better approach.

Thanks
Barry