lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRU7/ibyrPTn93qV@yjaykim-PowerEdge-T330>
Date: Thu, 13 Nov 2025 11:01:34 +0900
From: YoungJun Park <youngjun.park@....com>
To: Chris Li <chrisl@...nel.org>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, cgroups@...r.kernel.org,
	linux-kernel@...r.kernel.org, kasong@...cent.com,
	hannes@...xchg.org, mhocko@...nel.org, roman.gushchin@...ux.dev,
	shakeel.butt@...ux.dev, muchun.song@...ux.dev,
	shikemeng@...weicloud.com, nphamcs@...il.com, bhe@...hat.com,
	baohua@...nel.org, gunho.lee@....com, taejoon.song@....com
Subject: Re: [PATCH 2/3] mm: swap: introduce swap tier infrastructure

On Wed, Nov 12, 2025 at 06:20:22AM -0800, Chris Li wrote:
> First of all, for RFC series, please include RFC in each patch subject as well.
> 
> For the real patch submission, please consider split it into smaller
> chunks and have incremental milestones.
> Only introduce the function needed for each milestone, not define them
> all together then use it in later patches.
> 
> See some feedback as follows.
> 
> This patch is too big, to be continued.
> 
> Chris

Sure, I will take care of it. will make better on real patch submissions.

> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 966f7c1a0128..1224029620ed 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -283,6 +283,10 @@ struct mem_cgroup {
> >         /* per-memcg mm_struct list */
> >         struct lru_gen_mm_list mm_list;
> >  #endif
> > +#ifdef CONFIG_SWAP_TIER
> 
> I think we don't need this CONFIG_SWAP_TIER. Making it part of the
> default swap is fine.
> By default the memory.swap.tiers is empty and matches the previous
> swap behavior. The user doesn't need to do anything if they are not
> using swap tiers. I see no reason to have a separate config option.

Okay I will change it to default kernel option.

> > +       int tiers_onoff;
> > +       int tiers_mask;
> > +#endif
> >
> >  #ifdef CONFIG_MEMCG_V1
> >         /* Legacy consumer-oriented counters */
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 90fa27bb7796..8911eff9d37a 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -271,6 +271,9 @@ struct swap_info_struct {
> >         struct percpu_ref users;        /* indicate and keep swap device valid. */
> >         unsigned long   flags;          /* SWP_USED etc: see above */
> >         signed short    prio;           /* swap priority of this type */
> > +#ifdef CONFIG_SWAP_TIER
> > +       int tier_idx;
> > +#endif
> >         struct plist_node list;         /* entry in swap_active_head */
> >         signed char     type;           /* strange name for an index */
> >         unsigned int    max;            /* extent of the swap_map */
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index e1fb11f36ede..78173ffe65d6 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -163,6 +163,19 @@ endmenu
> >
> >  endif
> >
> > +config SWAP_TIER
> Same, I think we can remove the SWAP_TIER, just turn it on when swap is enabled.

Ack

> > diff --git a/mm/swap.h b/mm/swap.h
> > index d034c13d8dd2..b116282690c8 100644
> > --- a/mm/swap.h
> > +++ b/mm/swap.h
> > @@ -16,6 +16,10 @@ extern int page_cluster;
> >  #define swap_entry_order(order)        0
> >  #endif
> >
> > +#define DEF_SWAP_PRIO  -1
> > +
> > +extern spinlock_t swap_lock;
> > +extern struct plist_head swap_active_head;
> >  extern struct swap_info_struct *swap_info[];
> >
> >  /*
> > diff --git a/mm/swap_tier.c b/mm/swap_tier.c
> > new file mode 100644
> > index 000000000000..4301e3c766b9
> > --- /dev/null
> > +++ b/mm/swap_tier.c
> > @@ -0,0 +1,602 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include <linux/swap.h>
> > +#include <linux/rcupdate.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/plist.h>
> > +#include <linux/sysfs.h>
> > +#include <linux/sort.h>
> > +
> > +#include "swap_tier.h"
> > +
> > +/*
> > + * struct swap_tier - Structure representing a swap tier.
> > + *
> > + * @name:              Name of the swap_tier.
> > + * @prio_st:           Starting value of priority.
> > + * @prio_end:          Ending value of priority.
> > + * @list:              Linked list of active tiers.
> > + * @inactive_list:     Linked list of inactive tiers.
> > + * @kref:              Reference count (held by swap device and memory cgroup).
> > + */
> > +static struct swap_tier {
> > +       char name[MAX_TIERNAME];
> > +       short prio_st;
> > +       short prio_end;
> 
> I see a lot of complexity of the code come from this priority range.
> Having both start and end.
> We should be able to just keep just one of the start and end,  e.g.
> high end of the priority, then keep all tier in a sorted list or
> array. Then use the next tier's priority to indicate the other end of
> the current tier.

After checking my code, I decide to remove st or end.

> > +       union {
> > +               struct plist_node list;
> > +               struct list_head inactive_list;
> > +       };
> 
> Is this the list of swapfiles?
> Why union, how does it indicate which field of the union is valid?

It is swap_tier itself. The 'list' maintains active tiers, and
'inactive_list' maintains inactive tiers. One tier exists on either
'list' or 'inactive_list'. The code ensures that a tier must be on one
of them.

> > +
> > +/* swap_tiers initialization */
> > +void swap_tiers_init(void)
> > +{
> > +       struct swap_tier *tier;
> > +       int idx;
> > +
> > +       BUILD_BUG_ON(BITS_PER_TYPE(int) < MAX_SWAPTIER * 2);
> > +
> > +       for_each_tier(tier, idx) {
> > +               if (idx < SWAP_TIER_RESERVED_CNT) {
> > +                       /* for display fisrt */
> > +                       plist_node_init(&tier->list, -SHRT_MAX);
> > +                       plist_add(&tier->list, &swap_tier_active_list);
> > +                       kref_init(&tier->ref);
> > +               } else {
> > +                       INIT_LIST_HEAD(&tier->inactive_list);
> > +                       list_add_tail(&tier->inactive_list, &swap_tier_inactive_list);
> > +               }
> > +       }
> > +
> > +       strscpy(swap_tiers[SWAP_TIER_DEFAULT].name, DEFAULT_TIER_NAME);
> 
> The default tier is not a real tier. It shouldn't show up in the
> swap_tiers array.
> The default tier is only a wide cast for memory.swap.tiers to select
> tiers to turn on/off swap tiers. It is a wide cast pattern, not an
> actual tier.

Yeah, as I commented in the previous mail, I will change it to a
logical concept.

> > +void swap_tiers_show_memcg(struct seq_file *m, struct mem_cgroup *memcg)
> > +{
> > +       spin_lock(&swap_tier_lock);
> > +       if (memcg->tiers_onoff)
> > +               swap_tier_show_mask(m, memcg->tiers_onoff);
> > +       else
> > +               seq_puts(m, "\n");
> > +       swap_tier_show_mask(m, swap_tier_collect_mask(memcg));
> > +       spin_unlock(&swap_tier_lock);
> > +}
> > +
> > +void swap_tiers_assign(struct swap_info_struct *swp)
> > +{
> > +       struct swap_tier *tier;
> > +
> > +       spin_lock(&swap_tier_lock);
> > +       swp->tier_idx = NULL_TIER;
> > +
> > +       for_each_active_tier(tier) {
> > +               if (swap_tier_is_default(tier))
> > +                       continue;
> > +               if (swap_tier_prio_in_range(tier, swp->prio)) {
> > +                       swp->tier_idx = TIER_IDX(tier);
> > +                       swap_tier_get(tier);
> > +                       break;
> > +               }
> > +       }
> > +       spin_unlock(&swap_tier_lock);
> > +}
> > +
> > +void swap_tiers_release(struct swap_info_struct *swp)
> > +{
> > +       spin_lock(&swap_tier_lock);
> > +       if (swp->tier_idx != NULL_TIER)
> > +               swap_tier_put(&swap_tiers[swp->tier_idx]);
> > +       spin_unlock(&swap_tier_lock);
> > +}
> > +
> > +/* not incremental, but reset. */
> > +int swap_tiers_get_mask(struct tiers_desc *desc, int nr, struct mem_cgroup *memcg)
> 
> Who is using this function? I can't find the user of this function in
> this patch.
> Please introduce this function together with the patch that uses it.
> Don't introduce a function without a user.

swapoff calls swap_tiers_release. I must improve the readability of my
RFC patch series.

> 
> > +{
> > +       struct swap_tier *tier;
> > +       int ret = 0;
> > +       int tiers_mask = 0;
> > +       int tiers_onoff = 0;
> > +       int cnt = 0;
> > +
> > +       for (int i = 0; i < nr; i++) {
> > +               for (int j = i+1; j < nr; j++) {
> > +                       if (!strcmp(desc[i].name, desc[j].name))
> 
> These two nested for loops look suspicious. Again because I don't see

For assuring, unique tier name input.
I will think of making simple.

> the caller, I can't reason what it is doing here.

When a swap tier is designated by the memcg sys interface, it is
called. I will introduce this logic together with the caller
implementation.

Best Regards,
Youngjun Park

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ