[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250731170922.15509-1-sj@kernel.org>
Date: Thu, 31 Jul 2025 10:09:22 -0700
From: SeongJae Park <sj@...nel.org>
To: Johannes Weiner <hannes@...xchg.org>
Cc: SeongJae Park <sj@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Nhat Pham <nphamcs@...il.com>,
Takero Funaki <flintglass@...il.com>,
Yosry Ahmed <yosry.ahmed@...ux.dev>,
kernel-team@...a.com,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [RFC PATCH] mm/zswap: store compression failed page as-is
On Thu, 31 Jul 2025 11:27:01 -0400 Johannes Weiner <hannes@...xchg.org> wrote:
> Hi SJ,
>
> On Wed, Jul 30, 2025 at 04:40:59PM -0700, SeongJae Park wrote:
> > When zswap writeback is enabled and it fails compressing a given page,
> > zswap lets the page be swapped out to the backing swap device. This
> > behavior breaks the zswap's writeback LRU order, and hence users can
> > experience unexpected latency spikes.
>
> +1 Thanks for working on this!
My pleasure :)
[...]
> > config 0 1-1 1-2 1-3 2-1 2-2 2-3
> > perf_normalized 1.0000 0.0060 0.0230 0.0310 0.0030 0.0116 0.0003
> > perf_stdev_ratio 0.06 0.04 0.11 0.14 0.03 0.05 0.26
> > zswpin 0 0 1701702 6982188 0 2479848 815742
> > zswpout 0 0 1725260 7048517 0 2535744 931420
> > zswpwb 0 0 0 0 0 0 0
> > pswpin 0 468612 481270 0 476434 535772 0
> > pswpout 0 574634 689625 0 612683 591881 0
>
> zswpwb being zero across the board suggests the zswap shrinker was not
> enabled. Can you double check that?
You're correct, I didn't.
>
> We should only take on incompressible pages to maintain LRU order on
> their way to disk. If we don't try to move them out, then it's better
> to reject them and avoid the metadata overhead.
I agree. I will update the test to explore the writeback effect, and share the
results, by the posting of the next version of this patch.
[...]
> > +/*
> > + * If the compression is failed, try saving the content as is without
> > + * compression, to keep the LRU order. This can increase memory overhead from
> > + * metadata, but in common zswap use cases where there are sufficient amount of
> > + * compressible pages, the overhead should be not ciritical, and can be
> > + * mitigated by the writeback. Also, the decompression overhead is optimized.
> > + *
> > + * When the writeback is disabled, however, the additional overhead could be
> > + * problematic. For the case, just return the failure. swap_writeout() will
> > + * put the page back to the active LRU list in the case.
> > + */
> > +static int zswap_handle_compression_failure(int comp_ret, struct page *page,
> > + struct zswap_entry *entry)
> > +{
> > + if (!zswap_save_incompressible_pages)
> > + return comp_ret;
> > + if (!mem_cgroup_zswap_writeback_enabled(
> > + folio_memcg(page_folio(page))))
> > + return comp_ret;
> > +
> > + entry->orig_data = kmalloc_node(PAGE_SIZE, GFP_NOWAIT | __GFP_NORETRY |
> > + __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
> > + if (!entry->orig_data)
> > + return -ENOMEM;
> > + memcpy_from_page(entry->orig_data, page, 0, PAGE_SIZE);
> > + entry->length = PAGE_SIZE;
> > + atomic_long_inc(&zswap_stored_uncompressed_pages);
> > + return 0;
> > +}
>
> Better to still use the backend (zsmalloc) for storage. It'll give you
> migratability, highmem handling, stats etc.
Nhat also pointed out the migratability. Thank you for further letting me know
even more benefits from zsmalloc reuse. As I also replied to Nhat, I agree
those are important benefits, and I will rework on the next version to use the
backend.
>
> So if compression fails, still do zpool_malloc(), but for PAGE_SIZE
> and copy over the uncompressed page contents.
>
> struct zswap_entry has a hole after bool referenced, so you can add a
> flag to mark those uncompressed entries at no extra cost.
>
> Then you can detect this case in zswap_decompress() and handle the
> uncompressed copy into @folio accordingly.
I think we could still use 'zswap_entry->length == PAGE_SIZE' as the indicator,
As long as we ensure that always means the content is incompressed, following
Nhat's suggestion[1].
Please let me know if I'm missing something.
[1] https://lore.kernel.org/CAKEwX=Py+yvxtR5zt-1DtskhGWWHkRP_h8kneEHSrcQ947=m9Q@mail.gmail.com
Thanks,
SJ
Powered by blists - more mailing lists