[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRU7/ibyrPTn93qV@yjaykim-PowerEdge-T330>
Date: Thu, 13 Nov 2025 11:01:34 +0900
From: YoungJun Park <youngjun.park@....com>
To: Chris Li <chrisl@...nel.org>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, kasong@...cent.com,
hannes@...xchg.org, mhocko@...nel.org, roman.gushchin@...ux.dev,
shakeel.butt@...ux.dev, muchun.song@...ux.dev,
shikemeng@...weicloud.com, nphamcs@...il.com, bhe@...hat.com,
baohua@...nel.org, gunho.lee@....com, taejoon.song@....com
Subject: Re: [PATCH 2/3] mm: swap: introduce swap tier infrastructure
On Wed, Nov 12, 2025 at 06:20:22AM -0800, Chris Li wrote:
> First of all, for RFC series, please include RFC in each patch subject as well.
>
> For the real patch submission, please consider split it into smaller
> chunks and have incremental milestones.
> Only introduce the function needed for each milestone, not define them
> all together then use it in later patches.
>
> See some feedback as follows.
>
> This patch is too big, to be continued.
>
> Chris
Sure, I will take care of it. will make better on real patch submissions.
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 966f7c1a0128..1224029620ed 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -283,6 +283,10 @@ struct mem_cgroup {
> > /* per-memcg mm_struct list */
> > struct lru_gen_mm_list mm_list;
> > #endif
> > +#ifdef CONFIG_SWAP_TIER
>
> I think we don't need this CONFIG_SWAP_TIER. Making it part of the
> default swap is fine.
> By default the memory.swap.tiers is empty and matches the previous
> swap behavior. The user doesn't need to do anything if they are not
> using swap tiers. I see no reason to have a separate config option.
Okay I will change it to default kernel option.
> > + int tiers_onoff;
> > + int tiers_mask;
> > +#endif
> >
> > #ifdef CONFIG_MEMCG_V1
> > /* Legacy consumer-oriented counters */
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 90fa27bb7796..8911eff9d37a 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -271,6 +271,9 @@ struct swap_info_struct {
> > struct percpu_ref users; /* indicate and keep swap device valid. */
> > unsigned long flags; /* SWP_USED etc: see above */
> > signed short prio; /* swap priority of this type */
> > +#ifdef CONFIG_SWAP_TIER
> > + int tier_idx;
> > +#endif
> > struct plist_node list; /* entry in swap_active_head */
> > signed char type; /* strange name for an index */
> > unsigned int max; /* extent of the swap_map */
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index e1fb11f36ede..78173ffe65d6 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -163,6 +163,19 @@ endmenu
> >
> > endif
> >
> > +config SWAP_TIER
> Same, I think we can remove the SWAP_TIER, just turn it on when swap is enabled.
Ack
> > diff --git a/mm/swap.h b/mm/swap.h
> > index d034c13d8dd2..b116282690c8 100644
> > --- a/mm/swap.h
> > +++ b/mm/swap.h
> > @@ -16,6 +16,10 @@ extern int page_cluster;
> > #define swap_entry_order(order) 0
> > #endif
> >
> > +#define DEF_SWAP_PRIO -1
> > +
> > +extern spinlock_t swap_lock;
> > +extern struct plist_head swap_active_head;
> > extern struct swap_info_struct *swap_info[];
> >
> > /*
> > diff --git a/mm/swap_tier.c b/mm/swap_tier.c
> > new file mode 100644
> > index 000000000000..4301e3c766b9
> > --- /dev/null
> > +++ b/mm/swap_tier.c
> > @@ -0,0 +1,602 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include <linux/swap.h>
> > +#include <linux/rcupdate.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/plist.h>
> > +#include <linux/sysfs.h>
> > +#include <linux/sort.h>
> > +
> > +#include "swap_tier.h"
> > +
> > +/*
> > + * struct swap_tier - Structure representing a swap tier.
> > + *
> > + * @name: Name of the swap_tier.
> > + * @prio_st: Starting value of priority.
> > + * @prio_end: Ending value of priority.
> > + * @list: Linked list of active tiers.
> > + * @inactive_list: Linked list of inactive tiers.
> > + * @kref: Reference count (held by swap device and memory cgroup).
> > + */
> > +static struct swap_tier {
> > + char name[MAX_TIERNAME];
> > + short prio_st;
> > + short prio_end;
>
> I see a lot of complexity of the code come from this priority range.
> Having both start and end.
> We should be able to just keep just one of the start and end, e.g.
> high end of the priority, then keep all tier in a sorted list or
> array. Then use the next tier's priority to indicate the other end of
> the current tier.
After checking my code, I decide to remove st or end.
> > + union {
> > + struct plist_node list;
> > + struct list_head inactive_list;
> > + };
>
> Is this the list of swapfiles?
> Why union, how does it indicate which field of the union is valid?
It is swap_tier itself. The 'list' maintains active tiers, and
'inactive_list' maintains inactive tiers. One tier exists on either
'list' or 'inactive_list'. The code ensures that a tier must be on one
of them.
> > +
> > +/* swap_tiers initialization */
> > +void swap_tiers_init(void)
> > +{
> > + struct swap_tier *tier;
> > + int idx;
> > +
> > + BUILD_BUG_ON(BITS_PER_TYPE(int) < MAX_SWAPTIER * 2);
> > +
> > + for_each_tier(tier, idx) {
> > + if (idx < SWAP_TIER_RESERVED_CNT) {
> > + /* for display fisrt */
> > + plist_node_init(&tier->list, -SHRT_MAX);
> > + plist_add(&tier->list, &swap_tier_active_list);
> > + kref_init(&tier->ref);
> > + } else {
> > + INIT_LIST_HEAD(&tier->inactive_list);
> > + list_add_tail(&tier->inactive_list, &swap_tier_inactive_list);
> > + }
> > + }
> > +
> > + strscpy(swap_tiers[SWAP_TIER_DEFAULT].name, DEFAULT_TIER_NAME);
>
> The default tier is not a real tier. It shouldn't show up in the
> swap_tiers array.
> The default tier is only a wide cast for memory.swap.tiers to select
> tiers to turn on/off swap tiers. It is a wide cast pattern, not an
> actual tier.
Yeah, as I commented in the previous mail, I will change it to a
logical concept.
> > +void swap_tiers_show_memcg(struct seq_file *m, struct mem_cgroup *memcg)
> > +{
> > + spin_lock(&swap_tier_lock);
> > + if (memcg->tiers_onoff)
> > + swap_tier_show_mask(m, memcg->tiers_onoff);
> > + else
> > + seq_puts(m, "\n");
> > + swap_tier_show_mask(m, swap_tier_collect_mask(memcg));
> > + spin_unlock(&swap_tier_lock);
> > +}
> > +
> > +void swap_tiers_assign(struct swap_info_struct *swp)
> > +{
> > + struct swap_tier *tier;
> > +
> > + spin_lock(&swap_tier_lock);
> > + swp->tier_idx = NULL_TIER;
> > +
> > + for_each_active_tier(tier) {
> > + if (swap_tier_is_default(tier))
> > + continue;
> > + if (swap_tier_prio_in_range(tier, swp->prio)) {
> > + swp->tier_idx = TIER_IDX(tier);
> > + swap_tier_get(tier);
> > + break;
> > + }
> > + }
> > + spin_unlock(&swap_tier_lock);
> > +}
> > +
> > +void swap_tiers_release(struct swap_info_struct *swp)
> > +{
> > + spin_lock(&swap_tier_lock);
> > + if (swp->tier_idx != NULL_TIER)
> > + swap_tier_put(&swap_tiers[swp->tier_idx]);
> > + spin_unlock(&swap_tier_lock);
> > +}
> > +
> > +/* not incremental, but reset. */
> > +int swap_tiers_get_mask(struct tiers_desc *desc, int nr, struct mem_cgroup *memcg)
>
> Who is using this function? I can't find the user of this function in
> this patch.
> Please introduce this function together with the patch that uses it.
> Don't introduce a function without a user.
swapoff calls swap_tiers_release. I must improve the readability of my
RFC patch series.
>
> > +{
> > + struct swap_tier *tier;
> > + int ret = 0;
> > + int tiers_mask = 0;
> > + int tiers_onoff = 0;
> > + int cnt = 0;
> > +
> > + for (int i = 0; i < nr; i++) {
> > + for (int j = i+1; j < nr; j++) {
> > + if (!strcmp(desc[i].name, desc[j].name))
>
> These two nested for loops look suspicious. Again because I don't see
For assuring, unique tier name input.
I will think of making simple.
> the caller, I can't reason what it is doing here.
When a swap tier is designated by the memcg sys interface, it is
called. I will introduce this logic together with the caller
implementation.
Best Regards,
Youngjun Park
Powered by blists - more mailing lists