lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=MjyEsoyDmMBCRr0QnBfgkTA5bfrshPbfSgNp887zaxVw@mail.gmail.com>
Date: Fri, 30 May 2025 09:52:42 -0700
From: Nhat Pham <nphamcs@...il.com>
To: YoungJun Park <youngjun.park@....com>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org, hannes@...xchg.org, 
	hughd@...gle.com, yosry.ahmed@...ux.dev, mhocko@...nel.org, 
	roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev, 
	len.brown@...el.com, chengming.zhou@...ux.dev, kasong@...cent.com, 
	chrisl@...nel.org, huang.ying.caritas@...il.com, ryan.roberts@....com, 
	viro@...iv.linux.org.uk, baohua@...nel.org, osalvador@...e.de, 
	lorenzo.stoakes@...cle.com, christophe.leroy@...roup.eu, pavel@...nel.org, 
	kernel-team@...a.com, linux-kernel@...r.kernel.org, cgroups@...r.kernel.org, 
	linux-pm@...r.kernel.org, peterx@...hat.com, gunho.lee@....com, 
	taejoon.song@....com, iamjoonsoo.kim@....com
Subject: Re: [RFC PATCH v2 00/18] Virtual Swap Space

On Thu, May 29, 2025 at 11:47 PM YoungJun Park <youngjun.park@....com> wrote:
>
> On Tue, Apr 29, 2025 at 04:38:28PM -0700, Nhat Pham wrote:
> > Changelog:
> > * v2:
> >       * Use a single atomic type (swap_refs) for reference counting
> >         purpose. This brings the size of the swap descriptor from 64 KB
> >         down to 48 KB (25% reduction). Suggested by Yosry Ahmed.
> >       * Zeromap bitmap is removed in the virtual swap implementation.
> >         This saves one bit per phyiscal swapfile slot.
> >       * Rearrange the patches and the code change to make things more
> >         reviewable. Suggested by Johannes Weiner.
> >       * Update the cover letter a bit.
>
> Hi Nhat,
>
> Thank you for sharing this patch series.
> I’ve read through it with great interest.
>
> I’m part of a kernel team working on features related to multi-tier swapping,
> and this patch set appears quite relevant
> to our ongoing discussions and early-stage implementation.

May I ask - what's the use case you're thinking of here? Remote swapping?

>
> I had a couple of questions regarding the future direction.
>
> > * Multi-tier swapping (as mentioned in [5]), with transparent
> >   transferring (promotion/demotion) of pages across tiers (see [8] and
> >   [9]). Similar to swapoff, with the old design we would need to
> >   perform the expensive page table walk.
>
> Based on the discussion in [5], it seems there was some exploration
> around enabling per-cgroup selection of multiple tiers.
> Do you envision the current design evolving in a similar direction
> to those past discussions, or is there a different direction you're aiming for?

IIRC, that past design focused on the interface aspect of the problem,
but never actually touched the mechanism to implement a multi-tier
swapping solution.

The simple reason is it's impossible, or at least highly inefficient
to do it in the current design, i.e without virtualizing swap. Storing
the physical swap location in PTEs means that changing the swap
backend requires a full page table walk to update all the PTEs that
refer to the old physical swap location. So you have to pick your
poison - either:

1. Pick your backend at swap out time, and never change it. You might
not have sufficient information to decide at that time. It prevents
you from adapting to the change in workload dynamics and working set -
the access frequency of pages might change, so their physical location
should change accordingly.

2. Reserve the space in every tier, and associate them with the same
handle. This is kinda what zswap is doing. It is space efficient, and
create a lot of operational issues in production.

3. Bite the bullet and perform the page table walk. This is what
swapoff is doing, basically. Raise your hands if you're excited about
a full page table walk every time you want to evict a page from zswap
to disk swap. Booo.

This new design will give us an efficient way to perform tier transfer
- you need to figure out how to obtain the right to perform the
transfer (for now, through the swap cache - but you can perhaps
envision some sort of locks), and then you can simply make the change
at the virtual layer.

>
> >   This idea is very similar to Kairui's work to optimize the (physical)
> >   swap allocator. He is currently also working on a swap redesign (see
> >   [11]) - perhaps we can combine the two efforts to take advantage of
> >   the swap allocator's efficiency for virtual swap.
>
> I noticed that your patch appears to be aligned with the work from Kairui.
> It seems like the overall architecture may be headed toward introducing
> a virtual swap device layer.
> I'm curious if there’s already been any concrete discussion
> around this abstraction, especially regarding how it might be layered over
> multiple physical swap devices?
>
> From a naive perspective, I imagine that while today’s swap devices
> are in a 1:1 mapping with physical devices,
> this virtual layer could introduce a 1:N relationship —
> one virtual swap device mapped to multiple physical ones.
> Would this virtual device behave as a new swappable block device
> exposed via `swapon`, or is the plan to abstract it differently?

That was one of the ideas I was thinking of. Problem is this is a very
special "device", and I'm not entirely sure opting in through swapon
like that won't cause issues. Imagine the following scenario:

1. We swap on a normal swapfile.

2. Users swap things with the swapfile.

2. Sysadmin then swapon a virtual swap device.

It will be quite nightmarish to manage things - we need to be extra
vigilant in handling a physical swap slot for e.g, since it can back a
PTE or a virtual swap slot. Also, swapoff becomes less efficient
again. And the physical swap allocator, even with the swap table
change, doesn't quite work out of the box for virtual swap yet (see
[1]).

I think it's better to just keep it separate, for now, and adopt
elements from Kairui's work to make virtual swap allocation more
efficient. Not a hill I will die on though,

[1]: https://lore.kernel.org/linux-mm/CAKEwX=MmD___ukRrx=hLo7d_m1J_uG_Ke+us7RQgFUV2OSg38w@mail.gmail.com/

>
> Thanks again for your work,
> and I would greatly appreciate any insights you could share.
>
> Best regards,
> YoungJun Park
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ