linux-kernel - Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMvvPS5JL20OJic0EFKbuY_VgEAjveJoquTReyLjknSZ-6BeVQ@mail.gmail.com>
Date: Mon, 23 Jun 2025 09:16:53 -0500
From: Bijan Tabatabai <bijan311@...il.com>
To: SeongJae Park <sj@...nel.org>
Cc: damon@...ts.linux.dev, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	akpm@...ux-foundation.org, david@...hat.com, ziy@...dia.com, 
	matthew.brost@...el.com, joshua.hahnjy@...il.com, rakie.kim@...com, 
	byungchul@...com, gourry@...rry.net, ying.huang@...ux.alibaba.com, 
	apopple@...dia.com, bijantabatab@...ron.com, venkataravis@...ron.com, 
	emirakhur@...ron.com, ajayjoshi@...ron.com, vtavarespetr@...ron.com
Subject: Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets

On Sat, Jun 21, 2025 at 1:02 PM SeongJae Park <sj@...nel.org> wrote:
>
> Hi Bijan,
>
> On Fri, 20 Jun 2025 13:04:58 -0500 Bijan Tabatabai <bijan311@...il.com> wrote:
>
> > From: Bijan Tabatabai <bijantabatab@...ron.com>
> >
> > The migrate_{hot,cold} DAMONS actions take a parameter, target_nid, to
> > indicate what node the actions should migrate pages to. In this patch,
> > we allow passing in a list of migration targets into target_nid. When
> > this is done, the mirgate_{hot, cold} actions will migrate pages between
> > the specified nodes using the global interleave weights found at
> > /sys/kernel/mm/mempolicy/weighted_interleave/node<N>. This functionality
> > can be used to dynamically adjust how pages are interleaved in response
> > to changes in bandwidth utilization to improve performance, as discussed
> > in [1]. When only a single migration target is passed to target_nid, the
> > migrate_{hot,cold} actions will act the same as before.
> [...]
> >  include/linux/damon.h    |   8 +--
> >  mm/damon/core.c          |   9 ++--
> >  mm/damon/lru_sort.c      |   2 +-
> >  mm/damon/paddr.c         | 108 +++++++++++++++++++++++++++++++++++++--
> >  mm/damon/reclaim.c       |   2 +-
> >  mm/damon/sysfs-schemes.c |  14 +++--
> >  samples/damon/mtier.c    |   6 ++-
> >  samples/damon/prcl.c     |   2 +-
> >  8 files changed, 131 insertions(+), 20 deletions(-)
>
> If we keep pursuing making DAMON users be able to specify multiple migration
> destination nodes and their weights[1], I think we may need only paddr.c part
> change of this patch in the final version of this great work.

Sounds good to me.

> [...]
> >  static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> >               unsigned long *sz_filter_passed)
> >  {
> >       unsigned long addr, applied;
> > -     LIST_HEAD(folio_list);
> > +     struct rmap_walk_control rwc;
> [...]
> >
> >       addr = r->ar.start;
> >       while (addr < r->ar.end) {
> > @@ -522,15 +599,38 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> >               else
> >                       *sz_filter_passed += folio_size(folio);
> >
> > +             /*
> > +              * If there is only one target node, migrate there. Otherwise,
> > +              * interleave across the nodes according to the global
> > +              * interleave weights
> > +              */
> > +             if (nr_nodes == 1) {
> > +                     target_nid = first_node(s->target_nids);
> > +             } else {
> > +                     target_nid = NUMA_NO_NODE;
> > +                     /* Updates target_nid */
> > +                     rmap_walk(folio, &rwc);
> > +             }
>
> So we are doing rmap_walk(), which is known to be not very fast, for getting
> the target node id of this page, in a way very similar to that of weighted
> interleaving, right?  I don't think we really need to behave that same to
> weighted interleaving with the cost.
>
> I'd hence suggest to implement and use a simple weights handling mechanism
> here.  It could be roud-robin way, like weighted interleaving, or probabilistic
> way, using damon_rand().
>
> The round-robin way may be simpler in my opinion.  For example,
>
> unsigned int damos_pa_nid_to_migrate(struct damos_migrate_dest *dest)
> {
>         static unsigned int nr_migrated = 0;
>         unsigned int total_weight = 0;
>         unsigned int weights_to_ignore;
>         size_t i;
>
>         for (i = 0; i < dest->nr_dests; i++)
>                 total_weight += dest->weight_arr[i];
>         weights_to_ignore = nr_migrate++ % total_weight;
>         total_weight = 0;
>         for (i = 0; i < dest->nr_dests; i++) {
>                 total_weight += dest->weight_arr[i];
>                 if (total_weight >= weights_to_ignore)
>                         return dest->node_id_arr[i];
>         }
>         WARN_ON_ONCE(1, "I don't know what I did wrong");
>         return 0;
> }
>
> Then, we could replace the above rmap_walk() call with this one.  What do you
> think?

I do actually think doing the interleaving based on the VMA offset is
important for a couple of reasons.

1. If also using the weighted interleaving mempolicy, and the DAMON
weights are the same as the mempolicy weights, DAMON won't have to
migrate newly allocated pages. This is relatively minor, but helps
avoid unnecessary work.

2. More importantly, I believe this approach will cause a lot of
needless ping-ponging, where the same folios are being moved around
when they don't need to be. For example, let's say folios A-F are hot,
and just for simplification, if they are on the same node, they will
be in the same DAMON region, and only those folios are in those DAMON
regions. If all the folios start in Node 0 and both nodes have a
weight of 1, we have:

nr_migrated = 0
Node 0           Node 1
----------           ----------
A-F                  <empty>

After the scheme is first applied

nr_migrated = 6
Node 0           Node 1
----------           ----------
A,C,E              B,D,F

This is fine, but these folios are still hot, so the scheme will be
applied to them again

nr_migrated = 12
Node 0           Node 1
----------           ----------
A,E,D             C,D,F

If I am understanding your code sample correctly, this will continue
to happen each time the scheme is applied, causing folios to be
migrated for no reason. Using the VMA offset to determine where a page
should be placed avoids this problem because it gives a folio a single
node it can be in for a given set of interleave weights. This means
that in steady state, no folios will be migrated.

I see what you're saying about rmap_walks being expensive, but since
DAMON operates off the critical path for the workload, I don't think
the cost is that problematic.

[...]

Let me know what you think,
Bijan