[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRVZbMNgvDbZDnDK@yjaykim-PowerEdge-T330>
Date: Thu, 13 Nov 2025 13:07:08 +0900
From: YoungJun Park <youngjun.park@....com>
To: Chris Li <chrisl@...nel.org>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, kasong@...cent.com,
hannes@...xchg.org, mhocko@...nel.org, roman.gushchin@...ux.dev,
shakeel.butt@...ux.dev, muchun.song@...ux.dev,
shikemeng@...weicloud.com, nphamcs@...il.com, bhe@...hat.com,
baohua@...nel.org, gunho.lee@....com, taejoon.song@....com
Subject: Re: [PATCH 3/3] mm/swap: integrate swap tier infrastructure into
swap subsystem
On Wed, Nov 12, 2025 at 06:44:23AM -0800, Chris Li wrote:
> It seems you should introduce the tiers first. Allow users to define tiers.
> Then the follow up patches use tiers defined here.
>
> The patch order seems reversed to me.
>
> See some feedback below, to be continued.
>
> Chris
Ack. As I already mentioned, I will change it properly :D
> On Sun, Nov 9, 2025 at 4:50 AM Youngjun Park <youngjun.park@....com> wrote:
> >
> > Integrate the swap tier infrastructure into the existing swap subsystem
> > to enable selective swap device usage based on tier configuration.
> >
> > Signed-off-by: Youngjun Park <youngjun.park@....com>
> > ---
> > mm/memcontrol.c | 69 ++++++++++++++++++++++++++++++++++++
> > mm/page_io.c | 21 ++++++++++-
> > mm/swap_state.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++
> > mm/swapfile.c | 15 ++++++--
> > 4 files changed, 194 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index bfc986da3289..33c7cc069754 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -68,6 +68,7 @@
> > #include <net/ip.h>
> > #include "slab.h"
> > #include "memcontrol-v1.h"
> > +#include "swap_tier.h"
> >
> > #include <linux/uaccess.h>
> >
> > @@ -3730,6 +3731,7 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
> > {
> > lru_gen_exit_memcg(memcg);
> > memcg_wb_domain_exit(memcg);
> > + swap_tiers_put_mask(memcg);
> > __mem_cgroup_free(memcg);
> > }
> >
> > @@ -3842,6 +3844,11 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> > page_counter_init(&memcg->kmem, &parent->kmem, false);
> > page_counter_init(&memcg->tcpmem, &parent->tcpmem, false);
> > #endif
> > +#ifdef CONFIG_SWAP_TIER
> > + memcg->tiers_mask = 0;
> > + memcg->tiers_onoff = 0;
> > +#endif
> > +
> > } else {
> > init_memcg_stats();
> > init_memcg_events();
> > @@ -3850,6 +3857,10 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> > #ifdef CONFIG_MEMCG_V1
> > page_counter_init(&memcg->kmem, NULL, false);
> > page_counter_init(&memcg->tcpmem, NULL, false);
> > +#endif
> > +#ifdef CONFIG_SWAP_TIER
> Again, don't need this config.
Ack.
> > + memcg->tiers_mask = DEFAULT_FULL_MASK;
>
> Is this memcg->tiers_mask a cached value after evaluating the
> swap.tiers onoff list by looking up the parent?
It is updated when configured through the memcg interface.
The tiers_onoff field represents which tiers are turned on or off,
and tiers_mask is the mask that includes both on and off bits for
those tiers. This mask is used in swap_tier_collect_mask logic to
avoid recalculating it every time.
> I was thinking of starting with always evaluating the tiers_mask. Then
> we don't need to store it here. How do you indicate the tiers_mask is
> out of date?
swap_tier_collect_mask does not cache it. The effective mask is
calculated at swap I/O time. It only changes through the user
interface.
>From your mention of “evaluation,” I understand you are referring to
a dynamically computed mask that depends on the parent’s settings.
However, in my implementation, tiers_mask is simply the mask of the
selected tiers. tiers_onoff indicates on/off state, and tiers_mask
represents the full mask (both on and off bits) needed for
swap_tier_collect_mask calculation. Therefore, tiers_mask can be
derived from tiers_onoff and could potentially be removed.
> > + memcg->tiers_onoff = DEFAULT_ON_MASK;
> > #endif
> > root_mem_cgroup = memcg;
> > return &memcg->css;
> > @@ -5390,6 +5401,56 @@ static int swap_events_show(struct seq_file *m, void *v)
> > return 0;
> > }
> >
> > +#ifdef CONFIG_SWAP_TIER
> > +static int swap_tier_show(struct seq_file *m, void *v)
> > +{
> > + struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> > +
> > + swap_tiers_show_memcg(m, memcg);
> > + return 0;
> > +}
> > +
> > +static ssize_t swap_tier_write(struct kernfs_open_file *of,
> > + char *buf, size_t nbytes, loff_t off)
> > +{
> > + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> > + struct tiers_desc desc[MAX_SWAPTIER] = {};
> > + char *pos = buf, *token;
> > + int nr = 0;
> > + int ret;
> > +
> > + while ((token = strsep(&pos, " \t\n")) != NULL) {
>
> Not allowing plain space " "? Compare pointer != NULL is redundant.
You mean just allow " "?
I will change it to:
while ((token = strsep(" "))) {
> > + if (!*token)
> > + continue;
> > +
> > + if (nr >= MAX_SWAPTIER)
> > + return -E2BIG;
> > +
> > + if (token[0] != '+' && token[0] != '-')
> > + return -EINVAL;
> > +
> > + desc[nr].ops = (token[0] == '+') ? TIER_ON_MASK : TIER_OFF_MASK;
> > +
> > + if (strlen(token) <= 1) {
> > + strscpy(desc[nr].name, DEFAULT_TIER_NAME);
> > + nr++;
> > + continue;
> > + }
> > +
> > + if (strscpy(desc[nr].name, token + 1, MAX_TIERNAME) < 0)
> > + return -EINVAL;
> > +
> > + nr++;
> I don't think you need this nr, you will reach to the end of the
> string any way. What if the user specifies the same tier more than
> once? It is not optimal but the kernel should take it.
>
> OK, I see what is going on now, this whole desc thing can be greatly
> simplified. You shouldn't need to maintain the desc[nr], that desc
> array is the onoff mask in my mind. You just need to keep the tier
> bits in order.
>
> Notice in the memory.swap.tiers. Except for the default tier pattern,
> which always the first one if exists. The rest of the tier + - order
> does not matter. You look up the tier name into the tier mask bit.
> Just set the onoff bits for that tier.
The desc array is currently used as an
intermediate structure before applying the bitmask in swap_tier.c.
but as you mentioned, it might be unnecessary. I will review this
part to see if it can be simplified.
Best Regards,
Youngjun Park
Powered by blists - more mailing lists