lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbUBEQsgoei=ykeM6uX_GWaFywZEXY7dGt+T6Pt4zhiGsA@mail.gmail.com>
Date: Sat, 15 Nov 2025 07:13:49 -0800
From: Chris Li <chrisl@...nel.org>
To: SeongJae Park <sj@...nel.org>
Cc: Youngjun Park <youngjun.park@....com>, akpm@...ux-foundation.org, linux-mm@...ck.org, 
	cgroups@...r.kernel.org, linux-kernel@...r.kernel.org, kasong@...cent.com, 
	hannes@...xchg.org, mhocko@...nel.org, roman.gushchin@...ux.dev, 
	shakeel.butt@...ux.dev, muchun.song@...ux.dev, shikemeng@...weicloud.com, 
	nphamcs@...il.com, bhe@...hat.com, baohua@...nel.org, gunho.lee@....com, 
	taejoon.song@....com
Subject: Re: [RFC] mm/swap, memcg: Introduce swap tiers for cgroup based swap control

On Fri, Nov 14, 2025 at 5:22 PM SeongJae Park <sj@...nel.org> wrote:
>
> On Sun,  9 Nov 2025 21:49:44 +0900 Youngjun Park <youngjun.park@....com> wrote:
>
> > Hi all,
> >
> > In constrained environments, there is a need to improve workload
> > performance by controlling swap device usage on a per-process or
> > per-cgroup basis. For example, one might want to direct critical
> > processes to faster swap devices (like SSDs) while relegating
> > less critical ones to slower devices (like HDDs or Network Swap).
> >
> > Initial approach was to introduce a per-cgroup swap priority
> > mechanism [1]. However, through review and discussion, several
> > drawbacks were identified:
> >
> > a. There is a lack of concrete use cases for assigning a fine-grained,
> >    unique swap priority to each cgroup.
> > b. The implementation complexity was high relative to the desired
> >    level of control.
> > c. Differing swap priorities between cgroups could lead to LRU
> >    inversion problems.
> >
> > To address these concerns, I propose the "swap tiers" concept,
> > originally suggested by Chris Li [2] and further developed through
> > collaborative discussions. I would like to thank Chris Li and
> > He Baoquan for their invaluable contributions in refining this
> > approach, and Kairui Song, Nhat Pham, and Michal Koutný for their
> > insightful reviews of earlier RFC versions.
>
> I think the tiers concept is a nice abstraction.  I'm also interested in how
> the in-kernel control mechanism will deal with tiers management, which is not
> always simple.  I'll try to take a time to read this series thoroughly.  Thank
> you for sharing this nice work!

Thank you for your interest. Please keep in mind that this patch
series is RFC. I suspect the current series will go through a lot of
overhaul before it gets merged in. I predict the end result will
likely have less than half of the code resemble what it is in the
series right  now.

> Nevertheless, I'm curious if there is simpler and more flexible ways to achieve
> the goal (control of swap device to use).  For example, extending existing
Simplicity is one of my primary design principles. The current design
is close to the simplest within the design constraints.

> proactive pageout features, such as memory.reclaim, MADV_PAGEOUT or
> DAMOS_PAGEOUT, to let users specify the swap device to use.  Doing such

In my mind that is a later phase. No, per VMA swapfile is not simpler
to use, nor is the API simpler to code. There are much more VMA than
memcg in the system, no even the same magnitude. It is a higher burden
for both user space and kernel to maintain all the per VMA mapping.
The VMA and mmap path is much more complex to hack. Doing it on the
memcg level as the first step is the right approach.

> extension for MADV_PAGEOUT may be challenging, but it might be doable for
> memory.reclaim and DAMOS_PAGEOUT.  Have you considered this kind of options?

Yes, as YoungJun points out, that has been considered here, but in a
later phase. Borrow the link in his email here:
https://lore.kernel.org/linux-mm/CACePvbW_Q6O2ppMG35gwj7OHCdbjja3qUCF1T7GFsm9VDr2e_g@mail.gmail.com/

Chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ