lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbVJEWd64r-o8ezh+0QByYbWjYVKiLgxNiBJOjfRWP__sw@mail.gmail.com>
Date: Tue, 25 Nov 2025 23:27:04 +0400
From: Chris Li <chrisl@...nel.org>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Kairui Song <kasong@...cent.com>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Nhat Pham <nphamcs@...il.com>, 
	Baoquan He <bhe@...hat.com>, Barry Song <baohua@...nel.org>, Yosry Ahmed <yosry.ahmed@...ux.dev>, 
	Chengming Zhou <chengming.zhou@...ux.dev>, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	pratmal@...gle.com, sweettea@...gle.com, gthelen@...gle.com, 
	weixugc@...gle.com
Subject: Re: [PATCH RFC] mm: ghost swapfile support for zswap

On Mon, Nov 24, 2025 at 11:33 PM Johannes Weiner <hannes@...xchg.org> wrote:
> > > Do you have a link to that proposal?
> >
> > My 2024 LSF swap pony talk already has a mechanism to redirect page
> > cache swap entries to different physical locations.
> > That can also work for redirecting swap entries in different swapfiles.
> >
> > https://lore.kernel.org/linux-mm/CANeU7QnPsTouKxdK2QO8Opho6dh1qMGTox2e5kFOV8jKoEJwig@mail.gmail.com/
>
> I looked through your slides and the LWN article, but it's very hard
> for me to find answers to my questions in there.

Naturally, the slide is only intended to cover what is in the current
swap table may be phase VII.
But it does have the physical location pointer consideration.

> In your proposal, let's say you have a swp_entry_t in the page
> table. What does it describe, and what are the data structures to get
> from this key to user data in the following scenarios:

Please keep in mind that I don't have every detail design laid out. I
follow the first principles that redirect a swap entry page should
only take an additional 4 byte per swap entry. VS blow up the swap
entry size by something like 24 bytes? I am pretty sure I am wrong
about the exact value. People who are familiar with VS please correct
me. My impression is that it is too far away from the first principle
value, I would not even consider. Exceptions can be made, but not that
far.

I will try my best to answer your question but usually I am more glad
to work with someone who is going to implement it to iron out all the
details. Right now it is a bit too far.

> - Data is in a swapfile
Same as current.

> - Data is in zswap

I have now realized that what I want from the memory swap tier is
actually not the same as today's zswap. I don't want the current
behavior of zswap in the swap.tiers. The zswap seat in front of every
swapfile. The zswap.writeback does not tell which particular swapfile
it wants to write to. It creates problems in the per memcg swap.tier
to include zswap as it is. I don't want the zswap to use another
swapfile swap entry and write through to it.

If data is in the memory tier swapfile, the swap entry looks up to the
actual data without redirection.

> - Data is in being written from zswap to a swapfile
It will look up the swap table and find a physical pointer, which
points to the physical device and office having the data.

> - Data is back in memory due to a fault from another page table
In the swap cache similar to today's swapfile.

> > > My understanding of swap tiers was about grouping different swapfiles
> > > and assigning them to cgroups. The issue with writeback is relocating
> > > the data that a swp_entry_t page table refers to - without having to
> > > find and update all the possible page tables. I'm not sure how
> > > swap.tiers solve this problem.
> >
> > swap.tiers is part of the picture. You are right the LPC topic mostly
> > covers the per cgroup portion. The VFS swap ops are my two slides of
> > the LPC 2023. You read from one swap file and write to another swap
> > file with a new swap entry allocated.
>
> Ok, and from what you wrote below, presumably at this point you would
> put a redirection pointer in the old location to point to the new one.

>From the swap entry front end (also owns the swap cache) point to a
physical location.
>
> This way you only have the indirection IF such a relocation actually
> happened, correct?

Right. The more common

> But how do you store new data in the freed up old slot?
That is the front end swap entry and the physical back end split.
The front end swap entry can't be free until all users release the swap count.
The physical back end can be free. The free physical blocks caused by
redirection will likely have a different allocator, not the cluster
based swap allocator. Because those are just pure blocks.

>
> > > As to your specific points - we use xarray lookups in the page cache
> > > fast path. It's a bold claim to say this would be too much overhead
> > > during swapins.
> >
> > Yes, we just get rid of xarray in swap cache lookup and get some
> > performance gain from it.
> > You are saying one extra xarray is no problem, can your team demo some
> > performance number of impact of the extra xarray lookup in VS? Just
> > run some swap benchmarks and share the result.
>
> Average and worst-case for all common usecases matter. There is no
> code on your side for the writeback case. (And it's exceedingly
> difficult to even get a mental model of how it would work from your
> responses and the slides you have linked).

As I said, that slide is only intended to explain swap table phase VII
how physical direction works with swap cache.
The swap.tiers define tiers for swap, obviously how to move data
between the tier is a natural consideration. That I mention in the
2023 talk in two slides.

I don't plan that level of detail that far ahead. I try to follow the
first principle as best as I can. There will be a lot of decisions
made only at the later phases.

> > > Two, it's not clear to me how you want to make writeback efficient
> > > *without* any sort of swap entry redirection. Walking all relevant
> > > page tables is expensive; and you have to be able to find them first.
> >
> > Swap cache can have a physical location redirection, see my 2024 LPC
> > slides. I have considered that way before the VS discussion.
> > https://lore.kernel.org/linux-mm/CANeU7QnPsTouKxdK2QO8Opho6dh1qMGTox2e5kFOV8jKoEJwig@mail.gmail.com/
>
> There are no matches for "redir" in either the email or the slides.

Yes, I use a different term in the slide. The continuous is the source
of the redirection, the non continuous is the destination of the
redirection. But in my mind I am not redirecting swap entries. The
swap entry might have an optional physical location pointer. The swap
entry front end and physical layer split.

Chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ