lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRVZbMNgvDbZDnDK@yjaykim-PowerEdge-T330>
Date: Thu, 13 Nov 2025 13:07:08 +0900
From: YoungJun Park <youngjun.park@....com>
To: Chris Li <chrisl@...nel.org>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, cgroups@...r.kernel.org,
	linux-kernel@...r.kernel.org, kasong@...cent.com,
	hannes@...xchg.org, mhocko@...nel.org, roman.gushchin@...ux.dev,
	shakeel.butt@...ux.dev, muchun.song@...ux.dev,
	shikemeng@...weicloud.com, nphamcs@...il.com, bhe@...hat.com,
	baohua@...nel.org, gunho.lee@....com, taejoon.song@....com
Subject: Re: [PATCH 3/3] mm/swap: integrate swap tier infrastructure into
 swap subsystem

On Wed, Nov 12, 2025 at 06:44:23AM -0800, Chris Li wrote:
> It seems you should introduce the tiers first. Allow users to define tiers.
> Then the follow up patches use tiers defined here.
> 
> The patch order seems reversed to me.
> 
> See some feedback below, to be continued.
> 
> Chris

Ack. As I already mentioned, I will change it properly :D

> On Sun, Nov 9, 2025 at 4:50 AM Youngjun Park &lt;youngjun.park@....com&gt; wrote:
> >
> > Integrate the swap tier infrastructure into the existing swap subsystem
> > to enable selective swap device usage based on tier configuration.
> >
> > Signed-off-by: Youngjun Park &lt;youngjun.park@....com&gt;
> > ---
> >  mm/memcontrol.c | 69 ++++++++++++++++++++++++++++++++++++
> >  mm/page_io.c    | 21 ++++++++++-
> >  mm/swap_state.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  mm/swapfile.c   | 15 ++++++--
> >  4 files changed, 194 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index bfc986da3289..33c7cc069754 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -68,6 +68,7 @@
> >  #include <net/ip.h>
> >  #include "slab.h"
> >  #include "memcontrol-v1.h"
> > +#include "swap_tier.h"
> >
> >  #include <linux/uaccess.h>
> >
> > @@ -3730,6 +3731,7 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
> >  {
> >         lru_gen_exit_memcg(memcg);
> >         memcg_wb_domain_exit(memcg);
> > +       swap_tiers_put_mask(memcg);
> >         __mem_cgroup_free(memcg);
> >  }
> >
> > @@ -3842,6 +3844,11 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> >                 page_counter_init(&memcg->kmem, &parent->kmem, false);
> >                 page_counter_init(&memcg->tcpmem, &parent->tcpmem, false);
> >  #endif
> > +#ifdef CONFIG_SWAP_TIER
> > +               memcg->tiers_mask = 0;
> > +               memcg->tiers_onoff = 0;
> > +#endif
> > +
> >         } else {
> >                 init_memcg_stats();
> >                 init_memcg_events();
> > @@ -3850,6 +3857,10 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> >  #ifdef CONFIG_MEMCG_V1
> >                 page_counter_init(&memcg->kmem, NULL, false);
> >                 page_counter_init(&memcg->tcpmem, NULL, false);
> > +#endif
> > +#ifdef CONFIG_SWAP_TIER
> Again, don't need this config.

Ack.

> > +               memcg->tiers_mask = DEFAULT_FULL_MASK;
>
> Is this memcg->tiers_mask a cached value after evaluating the
> swap.tiers onoff list by looking up the parent?

It is updated when configured through the memcg interface.
The tiers_onoff field represents which tiers are turned on or off,
and tiers_mask is the mask that includes both on and off bits for
those tiers. This mask is used in swap_tier_collect_mask logic to
avoid recalculating it every time.

> I was thinking of starting with always evaluating the tiers_mask. Then
> we don't need to store it here. How do you indicate the tiers_mask is
> out of date?

swap_tier_collect_mask does not cache it. The effective mask is
calculated at swap I/O time. It only changes through the user
interface.

>From your mention of “evaluation,” I understand you are referring to
a dynamically computed mask that depends on the parent’s settings.
However, in my implementation, tiers_mask is simply the mask of the
selected tiers. tiers_onoff indicates on/off state, and tiers_mask
represents the full mask (both on and off bits) needed for
swap_tier_collect_mask calculation. Therefore, tiers_mask can be
derived from tiers_onoff and could potentially be removed.

> > +               memcg->tiers_onoff = DEFAULT_ON_MASK;
> >  #endif
> >                 root_mem_cgroup = memcg;
> >                 return &memcg->css;
> > @@ -5390,6 +5401,56 @@ static int swap_events_show(struct seq_file *m, void *v)
> >         return 0;
> >  }
> >
> > +#ifdef CONFIG_SWAP_TIER
> > +static int swap_tier_show(struct seq_file *m, void *v)
> > +{
> > +       struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> > +
> > +       swap_tiers_show_memcg(m, memcg);
> > +       return 0;
> > +}
> > +
> > +static ssize_t swap_tier_write(struct kernfs_open_file *of,
> > +                               char *buf, size_t nbytes, loff_t off)
> > +{
> > +       struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> > +       struct tiers_desc desc[MAX_SWAPTIER] = {};
> > +       char *pos = buf, *token;
> > +       int nr = 0;
> > +       int ret;
> > +
> > +       while ((token = strsep(&pos, " \t\n")) != NULL) {
>
> Not allowing plain space " "? Compare pointer != NULL is redundant.

You mean just allow " "?

I will change it to:
while ((token = strsep(" "))) {

> > +               if (!*token)
> > +                       continue;
> > +
> > +               if (nr >= MAX_SWAPTIER)
> > +                       return -E2BIG;
> > +
> > +               if (token[0] != '+' && token[0] != '-')
> > +                       return -EINVAL;
> > +
> > +               desc[nr].ops = (token[0] == '+') ? TIER_ON_MASK : TIER_OFF_MASK;
> > +
> > +               if (strlen(token) <= 1) {
> > +                       strscpy(desc[nr].name, DEFAULT_TIER_NAME);
> > +                       nr++;
> > +                       continue;
> > +               }
> > +
> > +               if (strscpy(desc[nr].name, token + 1, MAX_TIERNAME) < 0)
> > +                       return -EINVAL;
> > +
> > +               nr++;
> I don't think you need this nr, you will reach to the end of the
> string any way. What if the user specifies the same tier more than
> once? It is not optimal but the kernel should take it.
>
> OK, I see what is going on now, this whole desc thing can be greatly
> simplified. You shouldn't need to maintain the desc[nr], that desc
> array is the onoff mask in my mind. You just need to keep the tier
> bits in order.
>
> Notice in the memory.swap.tiers. Except for the default tier pattern,
> which always the first one if exists. The rest of the tier + - order
> does not matter. You look up the tier name into the tier mask bit.
> Just set the onoff bits for that tier.

The desc array is currently used as an
intermediate structure before applying the bitmask in swap_tier.c.
but as you mentioned, it might be unnecessary. I will review this
part to see if it can be simplified.

Best Regards,
Youngjun Park

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ