[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbUxCN_2fHz0Ds=u52mmOvdhonvRgm6mPdcJcs5qLZj55Q@mail.gmail.com>
Date: Fri, 21 Nov 2025 17:52:01 -0800
From: Chris Li <chrisl@...nel.org>
To: Nhat Pham <nphamcs@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Kairui Song <kasong@...cent.com>,
Kemeng Shi <shikemeng@...weicloud.com>, Baoquan He <bhe@...hat.com>,
Barry Song <baohua@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
Yosry Ahmed <yosry.ahmed@...ux.dev>, Chengming Zhou <chengming.zhou@...ux.dev>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, pratmal@...gle.com, sweettea@...gle.com,
gthelen@...gle.com, weixugc@...gle.com
Subject: Re: [PATCH RFC] mm: ghost swapfile support for zswap
On Fri, Nov 21, 2025 at 2:19 AM Nhat Pham <nphamcs@...il.com> wrote:
>
> On Fri, Nov 21, 2025 at 9:32 AM Chris Li <chrisl@...nel.org> wrote:
> >
> > The current zswap requires a backing swapfile. The swap slot used
> > by zswap is not able to be used by the swapfile. That waste swapfile
> > space.
> >
> > The ghost swapfile is a swapfile that only contains the swapfile header
> > for zswap. The swapfile header indicate the size of the swapfile. There
> > is no swap data section in the ghost swapfile, therefore, no waste of
> > swapfile space. As such, any write to a ghost swapfile will fail. To
> > prevents accidental read or write of ghost swapfile, bdev of
> > swap_info_struct is set to NULL. Ghost swapfile will also set the SSD
> > flag because there is no rotation disk access when using zswap.
>
> Would this also affect the swap slot allocation algorithm?
>
> >
> > The zswap write back has been disabled if all swapfiles in the system
> > are ghost swap files.
>
> I don't like this design:
>
> 1. Statically sizing the compression tier will be an operational
> nightmare, for users that have to support a variety (and increasingly
> bigger sized) types of hosts. It's one of the primary motivations of
> the virtual swap line of work. We need to move towards a more dynamic
> architecture for zswap, not the other way around, in order to reduce
> both (human's) operational overhead, AND actual space overhead (i.e
> only allocate (z)swap metadata on-demand).
Let's do it one step at a time.
> 2. This digs us in the hole of supporting a special infrastructure for
> non-writeback cases. Now every future change to zswap's architecture
> has to take this into account. It's not easy to turn this design into
> something that can support writeback - you're stuck with either having
> to do an expensive page table walk to update the PTEs, or shoving the
> virtual swap layer inside zswap. Ugly.
What are you talking about? This patch does not have any page table
work. You are opposing something in your imagination. Please show me
the code in which I do expensive PTE walks.
> 3. And what does this even buy us? Just create a fake in-memory-only
> swapfile (heck, you can use zram), disable writeback (which you can do
> both at a cgroup and host-level), and call it a day.
Well this provides users a choice, if they don't care about write
backs. They can do zswap with ghost swapfile now without actually
wasting disk space.
It also does not stop zswap using write back with normal SSD. If you
want to write back, you can still use a non ghost swapfile as normal.
It is a simple enough patch to provide value right now. It also fits
into the swap.tiers long term roadmap to have a seperate tier for
memory based swapfiles. I believe that is a cleaner picture than the
current zswap as cache but also gets its hands so deep into the swap
stack and slows down other swap tiers.
> Nacked-by: Nhat Pham <nphamcs@...il.com>
I heard you, if you don't don't want zswap to have anything to do
with memory based swap tier in the swap.tiers design. I respect your
choice.
Chris
Powered by blists - more mailing lists