lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAF8kJuNuNuxxTbtkCb3Opsjfy-or7E+0AwPDi7L-EgqoraQ3Qg@mail.gmail.com>
Date: Sat, 16 Aug 2025 12:15:43 -0700
From: Chris Li <chrisl@...nel.org>
To: YoungJun Park <youngjun.park@....com>
Cc: Michal Koutný <mkoutny@...e.com>, 
	akpm@...ux-foundation.org, hannes@...xchg.org, mhocko@...nel.org, 
	roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev, 
	shikemeng@...weicloud.com, kasong@...cent.com, nphamcs@...il.com, 
	bhe@...hat.com, baohua@...nel.org, cgroups@...r.kernel.org, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, gunho.lee@....com, 
	iamjoonsoo.kim@....com, taejoon.song@....com, 
	Matthew Wilcox <willy@...radead.org>, David Hildenbrand <david@...hat.com>, Kairui Song <ryncsn@...il.com>
Subject: Re: [PATCH 1/4] mm/swap, memcg: Introduce infrastructure for
 cgroup-based swap priority

On Sat, Aug 16, 2025 at 10:21 AM YoungJun Park <youngjun.park@....com> wrote:
>
> On Fri, Aug 15, 2025 at 08:10:09AM -0700, Chris Li wrote:
> > Hi Michal and YoungJun,
>
> First of all, thank you for sharing your thoughts. I really appreciate the
> detailed feedback. I have many points I would like to think through and
> discuss as well. For now, let me give some quick feedback, and I will follow
> up with more detailed responses after I have had more time to reflect.

Please do, that is part of the community feedback and review process.

> > I am sorry for the late reply. I have briefly read through the patches
> > series the overall impression:
> > 1)  Priority is not the best way to select which swap file to use per cgroup.
> > The priority is assigned to one device, it is a per swap file local
> > change. The effect you want to see is actually a global one, how this
> > swap device compares to other devices. You actually want  a list at
> > the end result. Adjusting per swap file priority is backwards. A lot
> > of unnecessary usage complexity and code complexity come from that.
> > 2)  This series is too complicated for what it does.
>
> You mentioned that the series is overly complex and does more than what is
> really needed. I understand your concern. I have spent quite a lot of time
> thinking about this topic, and the reason I chose the priority approach is
> that it gives more flexibility and extensibility by reusing an existing
> concept.

I have not questioned the approach you can achieve with your goal. The
real question is, is this the best approach to consider to merge into
the main line Linux kernel. Merging into the main line kernel has a
very high bar. How is it compared to other alternative approaches in
terms of technical merit and complexity trade offs.

> Where you see unnecessary functionality, I tend to view it as providing more
> degrees of freedom and flexibility. In my view, the swap tier concept can be
> expressed as a subset of the per-cgroup priority model.

Why would I trade a cleaner less complex approach for a more complex
approach with technical deficiency not able to address (inverting swap
entry LRU ordering)?

> > I have a similar idea, "swap.tiers," first mentioned earlier here:
> > https://lore.kernel.org/linux-mm/CAF8kJuNFtejEtjQHg5UBGduvFNn3AaGn4ffyoOrEnXfHpx6Ubg@mail.gmail.com/
> >
> > I will outline the line in more detail in the last part of my reply.
> >
> > BTW, YoungJun and Michal, do you have the per cgroup swap file control
> > proposal for this year's LPC? If you want to, I am happy to work with
> > you on the swap tiers topic as a secondary. I probably don't have the
> > time to do it as a primary.
>
> I have not submitted an LPC proposal. If it turns out to be necessary,
> I agree it could be a good idea, and I truly appreciate your offer to
> work together on it.

Let me clarify. LPC is not required to get your series merged. Giving
a talk in LPC usually is an honor. It does not guarantee your series
gets merged either. It certainly helps your idea get more exposure and
discussion. You might be able to meet some maintainers in person. For
me, it is nice to meet the person to whom I have been communicating by
email. I was making the suggestion because it can be a good topic for
LPC, and just in case you might enjoy LPC. It is totally for your
benefit. Up to your decision, please don't make it a burden. It is
not.

If after your consideration, you do want to submit a proposal in LPC,
you need to hurry though. The deadline is closing soon.

> From my understanding, though, the community has
> so far received this patchset positively, so I hope the discussion can
> continue within this context and eventually be accepted there.

Let me make it very clear.  As it is, it will not get my support for
the reason I have laid out in my last email.

> > OK. I want to abandon the weight-adjustment approach. Here I outline
> > the swap tiers idea as follows. I can probably start a new thread for
> > that later.
> >
> > 1) No per cgroup swap priority adjustment. The swap file priority is
> > global to the system.
> > Per cgroup swap file ordering adjustment is bad from the LRU point of
> > view. We should make the swap file ordering matching to the swap
> > device service performance. Fast swap tier zram, zswap store hotter
> > data, slower tier hard drive store colder data.  SSD in between. It is
> > important to maintain the fast slow tier match to the hot cold LRU
> > ordering.
>
> Regarding your first point about swap tiers: I would like to study this part
> a bit more carefully.

Please do.

> If you could share some additional explanation, that
> would be very helpful for me.

Feel free to ask, I will do my best to answer.

> > More example:
> >  "- +ssd +hdd -ssd" will simplify to: "- +hdd", which means hdd only.
> >  "+ -hdd": No hdd for you! Use everything else.
> >
> > Let me know what you think about the above "swap.tiers"(name TBD) proposal.
>
> Thank you very much for the detailed description of the "swap.tiers" idea.
> As I understand it, the main idea is to separate swap devices by speed,
> assign a suitable priority range for each, and then make it easy for users to
> include or exclude tiers. I believe I have understood the concept clearly.
>
> I agree that operating with tiers is important. At the same time, as I
> mentioned earlier, I believe that managing priorities in a way that reflects
> tiers can also achieve the intended effect.

The per cgroup per swap file priorities has one Achilles heel you need
to address before you can make any further progress upstreaming it.
Putting the extra complexity aside, the per cgroup per swap file
priorities can invert swap entry LRU order between different views of
ordering by different cgroup.
That violates the swap entry LRU order between tiers.

>From the swap file point of view, when it needs to flush some data to
the lower tiers, it is very hard if possible for swap file to maintain
per cgroup LRU order within a swap file.
It is much easier if all the swap entries in a swap file are in the
same LRU order tier.

Inverting swap entry LRU order is a deal breaker for your per cgroup
per swap file priority approach.

> I have also been thinking about a possible compromise. If the interface is

The swap.tiers idea is not a compromise, it is a straight win. Can you
describe what per cgroup per swap file can do while swap.tiers can
not?

> intended to make tiers visible to users in the way you describe, then mapping
> priority ranges to tiers (as you propose) makes sense. Users would still have
> the flexibility to define ordering, while internally we could maintain the

Because I don't want to violate the swap entry LRU ordering between
tiers. Within that context, what usage case do you have in mind?
Within the same tier, the swap device can have finer grain priority
order between them. The part I haven't understood, please help me
understand, why do you need per cgroup per swap file orthering  rather
than the tier order? It is much easier from the admin's point of view.
This app needs to be fast, can't afford slow swap, give it faster swap
tiers.

> priority list model I suggested. I wonder what you think about such a hybrid
> approach.

It obviously will introduce new complexity. I want to understand the
reason to justify the additional complexity before I consider such an
approach.

> Thank you as always for your valuable insights.

My pleasure. Thanks for leading this per cgroup swap file effort.

Chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ