[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20250623051642.3645-1-21cnbao@gmail.com>
Date: Mon, 23 Jun 2025 17:16:42 +1200
From: Barry Song <21cnbao@...il.com>
To: nphamcs@...il.com
Cc: 21cnbao@...il.com,
akpm@...ux-foundation.org,
andrew.yang@...iatek.com,
angelogioacchino.delregno@...labora.com,
casper.li@...iatek.com,
chinwen.chang@...iatek.com,
hannes@...xchg.org,
james.hsu@...iatek.com,
linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org,
linux-mediatek@...ts.infradead.org,
linux-mm@...ck.org,
matthias.bgg@...il.com,
minchan@...nel.org,
qun-wei.lin@...iatek.com,
rppt@...nel.org,
senozhatsky@...omium.org,
sj@...nel.org
Subject: Re: [PATCH] mm: Add Kcompressd for accelerated memory compression
Hi Nhat,
On Wed, Jun 18, 2025 at 2:21 AM Nhat Pham <nphamcs@...il.com> wrote:
>
> On Sun, Jun 15, 2025 at 8:41 PM Barry Song <21cnbao@...il.com> wrote:
> > >>
> > >> That seems unnecessary. There is an existing method for asynchronous
> > >> writeback, and pageout() is naturally fully set up to handle this.
> > >>
> > >> IMO the better way to do this is to make zswap_store() (and
> > >> zram_bio_write()?) asynchronous. Make those functions queue the work
> > >> and wake the compression daemon, and then have the daemon call
> > >> folio_end_writeback() / bio_endio() when it's done with it.
> >
> > > +1.
> >
> >
> > But,
> > How could this be possible for zswap? zswap_store() is only a frontend —
> > we still need its return value to determine whether __swap_writepage()
> > is required. Waiting for the result of zswap_store() is inherently a
> > synchronous step.
>
> Hmm, I might be misunderstanding either of you, but it sounds like
> what you're describing here does not contradict what Johannes is
> proposing?
It seems contradictory: Johannes proposes that zswap could behave like zRAM
by invoking `folio_end_writeback()` or `bio_endio()`, but this doesn’t align
with actual behavior since zswap_store might not end `swap_writeout()`—it may
still proceed to `__swap_writeback()` to complete the final steps.
Meanwhile, Qun-wei’s RFC has already explored using `folio_end_writeback()` and
`bio_endio()` at the end of `__swap_writepage()` for zRAM, though that approach
also has its own issues.
>
> >
> > My point is that folio_end_writeback() and bio_endio() can only be
> > called after the entire zswap_store() → __swap_writepage() sequence is
> > completed. That’s why both are placed in the new kcompressed.
>
> Hmm, how about:
>
> 1. Inside zswap_store(), we first obtain the obj_cgroup reference,
> check cgroup and pool limit, and grab a zswap pool reference (in
> effect, determining the slot allocator and compressor).
>
> 2. Next, we try to queue the work to kcompressd, saving the folio and
> the zswap pool (and whatever else we need for the continuation). If
> this fails, we can proceed with the old synchronous path.
>
> 3. In kcompressed daemon, we perform the continuation of
> zswap_store(): compression, slot allocation, storing, zswap's LRU
> modification, etc. If this fails, we check if the mem_cgroup enables
> writeback. If it's enabled, we can call __swap_writepage(). Ideally,
> if writeback is disabled, we should activate the page, but it might
> not be possible since shrink_folio_list() might already re-add the
> page to the inactive lru. Maybe some modification of pageout() and
> shrink_folio_list() can make this work, but I haven't thought too
> deeply about it :) If it's impossible, we can perform async
> compression only for cgroups that enable writeback for now. Once we
> fix zswap's handling of incompressible pages, we can revisit this
> decision (+ SJ).
>
> TLDR: move the work-queueing step forward a bit, into the middle of
> zswap_store().
>
> One benefit of this is we skip pages of cgroups that disable zswap, or
> when zswap pool is full.
I assume you meant something like the following:
bool try_to_sched_async_zswap_store()
{
get_obj_cgroup_from_folio()
if (err) goto xxx;
zswap_check_limits();
if (err) goto xxx;
zswap_pool_current_get()
if (err) goto xxx;
queue_folio_to_kcompressd(folio);
return true;
xxx:
error handler things;
return false;
}
If this function returns true, it suggests that compression requests
have been queued to kcompressd. Following that, in kcompressd():
int __zswap_store(folio)
{
for(i=0;i<nr_pages;i++) {
zswap_store_page();
if (err) return err;
}
return 0;
}
kcompressd()
{
while(folio_queue_is_not_empty) {
folio = dequeue_folio();
if (folio_queued_by_zswap(folio)) {
if(!__zswap_store(folio))
continue;
}
if ((zswap_store_page_fails && mem_cgroup_zswap_writeback_enabled()) ||
folio_queued_by_zram) {
__swap_writepage();
}
}
}
In kswapd, we will need to do
int swap_writeout(struct folio *folio, struct swap_iocb **swap_plug)
{
...
if (try_to_sched_async_zswap_store(folio))
return;
if (is_sync_comp_blkdev(swap)) {
queue_folio_to_kcompressd(folio);
return;
}
__swap_writepage();
}
To be honest, I'm not sure if there's a flag that indicates whether the
folio was queued by zswap or zram. If not, we may need to add a member
associated with folio pointers in the queue between kswapd and kcompressd,
since we need to identify zswap cases. Maybe we can reuse bit 0 of the
folio pointer?
What I mean is: while queuing, if the folio is queued by zswap, we do
`pointer |= BIT(0)`. Then in kcompressd, we restore the original folio
with `folio = pointer & ~BIT(0)`. It's a bit ugly, but I’m not sure
there’s a better approach.
Thanks
Barry
Powered by blists - more mailing lists