[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aH+apAbBCmkMGPlO@yjaykim-PowerEdge-T330>
Date: Tue, 22 Jul 2025 23:05:24 +0900
From: YoungJun Park <youngjun.park@....com>
To: Michal Koutný <mkoutny@...e.com>
Cc: akpm@...ux-foundation.org, hannes@...xchg.org, mhocko@...nel.org,
roman.gushchin@...ux.dev, shakeel.butt@...ux.dev,
muchun.song@...ux.dev, shikemeng@...weicloud.com,
kasong@...cent.com, nphamcs@...il.com, bhe@...hat.com,
baohua@...nel.org, chrisl@...nel.org, cgroups@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, gunho.lee@....com,
iamjoonsoo.kim@....com, taejoon.song@....com
Subject: Re: [PATCH 1/4] mm/swap, memcg: Introduce infrastructure for
cgroup-based swap priority
On Tue, Jul 22, 2025 at 10:41:20AM +0200, Michal Koutný wrote:
> On Thu, Jul 17, 2025 at 05:20:03AM +0900, Youngjun Park <youngjun.park@....com> wrote:
> > + memory.swap.priority
> > + A read-write flat-keyed file which exists on non-root cgroups.
> > + This interface allows you to set per-swap-device priorities for the current
> > + cgroup and to define how they differ from the global swap system.
> > +
> > + To assign priorities or define specific behaviors for swap devices
> > + in the current cgroup, write one or more lines in the following
> > + formats:
> > +
> > + - <swap_device_id> <priority>
> > + - <swap_device_id> disabled
> > + - <swap_device_id> none
> > + - default none
> > + - default disabled
> > +
> > + Each <swap_device_id> refers to a unique swap device registered
> > + in the system. You can check the ID, device path, and current
> > + priority of active swap devices through the `/proc/swaps` file.
>
> Do you mean row number as the ID? Or does this depend on some other
> patches or API?
You're right to ask for clarification. The `<swap_device_id>` refers
to a unique identifier added to each swap device entry in `/proc/swaps`.
I will revise the documentation to make this clearer.
As a side note, I initially had concerns about breaking the existing ABI.
However, the additional ID column does not significantly change the
current output format and is gated behind `CONFIG_SWAP_CGROUP_PRIORITY`,
so it should be safe and intuitive to expose it through `/proc/swaps
> > + This provides a clear mapping between swap devices and the IDs
> > + used in this interface.
> > +
> > + The 'default' keyword sets the fallback priority behavior rule for
> > + this cgroup. If no specific entry matches a swap device, this default
> > + applies.
> > +
> > + * 'default none': This is the default if no configuration
> > + is explicitly written. Swap devices follow the system-wide
> > + swap priorities.
> > +
> > + * 'default disabled': All swap devices are excluded from this cgroup’s
> > + swap priority list and will not be used by this cgroup.
>
> This duplicates memory.swap.max=0. I'm not sure it's thus necessary.
> At the same time you don't accept 'default <priority>' (that's sane).
That's a valid observation. While `memory.swap.max=0` controls the overall
swap usage limit, the `default disabled` entry is intended to disable
specific swap devices within the scope of this cgroup interface. The
motivation was to offer more granular control over device selection
rather than total swap usage.
> > +
> > + The priority semantics are consistent with the global swap system:
> > +
> > + - Higher numerical values indicate higher preference.
> > + - See Documentation/admin-guide/mm/swap_numa.rst for details on
> > + swap NUMA autobinding and negative priority rules.
> > +
> > + The handling of negative priorities in this cgroup interface
> > + has specific behaviors for assignment and restoration:
> > +
> > + * Negative Priority Assignment
>
> Even in Documentation/admin-guide/mm/swap_numa.rst it's part of "Implementation details".
> I admit I'm daunted by this paragraphs. Is it important for this interface?
Thank you for pointing this out. My original philosophy was to preserve
as much of the existing swap functionality as possible, including
NUMA-aware behaviors.
However, I agree that the explanation is complex and also not be
necessary for my proposed usage. After some reflection, I believe the
implementation (and documentation) will be clearer and simpler without
supporting negative priorities here.
Unless further objections arise, I plan to drop this behavior in the next
version of the patch, as you suggested. If compelling use cases emerge in
the future, we can consider reintroducing the support at that time.
Thanks again for your helpful review!
Best regards,
Youngjun Park
Powered by blists - more mailing lists