linux-kernel - Re: [RFC PATCH 0/5] mm/damon: Add node_sys_bp quota goal metric for PA-based migration control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260128012539.72119-1-sj@kernel.org>
Date: Tue, 27 Jan 2026 17:25:37 -0800
From: SeongJae Park <sj@...nel.org>
To: Ravi Jonnalagadda <ravis.opensrc@...il.com>
Cc: SeongJae Park <sj@...nel.org>,
	damon@...ts.linux.dev,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	linux-doc@...r.kernel.org,
	akpm@...ux-foundation.org,
	corbet@....net,
	bijan311@...il.com,
	ajayjoshi@...ron.com,
	Honggyu Kim <honggyu.kim@...com>,
	Yunjeong Mun <yunjeong.mun@...com>
Subject: Re: [RFC PATCH 0/5] mm/damon: Add node_sys_bp quota goal metric for PA-based migration control

On Tue, 27 Jan 2026 10:52:53 -0800 Ravi Jonnalagadda <ravis.opensrc@...il.com> wrote:

> On Fri, Jan 23, 2026 at 5:50 PM SeongJae Park <sj@...nel.org> wrote:
> >
> > Cc-ing SK hynix folks (Honggyu and Yunjeong) for quota auto-tuning behavior
> > confusion (not stop immediately after satisfying the goal) I discuss below.
> >
> > Hello Ravi,
> >
> 
>  Hi SJ,
> 
>   Thank you for the detailed review and for Cc'ing SK hynix folks.
> 
> >
> > Thank you for sharing this great RFC!
> >
> > I had a chance to be involved in a high level design of this series, off the
> > mailing list.  I'm disclosing the fact for others' context, since a few of my
> > comments below are based on the previous off-list discussion.
> >
> > On Thu, 22 Jan 2026 20:57:23 -0800 Ravi Jonnalagadda <ravis.opensrc@...il.com> wrote:
> >
> > > This series introduces a new DAMON quota goal metric,
> >
> > As I discussed off the list, I believe the new goal can be useful.  Thank you
> > again for making this!
> >
> > > `node_sys_bp`, designed
> > > for controlling memory migration in heterogeneous memory systems (e.g.,
> > > DRAM↔CXL tiering). These patches are provided as an initial RFC and have
> > > not been tested on actual hardware.
> >
> > I feel like the name is not very self explaining what it is, though.  I'm also
> > bit confused about what it does is what we discussed off list.  I'll add more
> > details below, on more detailed explanation about the new goal.
> >
> 
> Will change the name to damos_target_mem_node_sys_bp in v2.

Sounds good name.  But, could it be more consistent with existing node-related
quota goal metrics such as 'node_mem_used_bp' and 'node_mem_free_bp'?  What
about... say... 'node_target_mem_bp'?  Please feel free to suggest a better
one.

[...]
> > > Solution: bp-Based Target State Goals
> > > =====================================
> > >
> > > Instead of specifying action rates, `node_sys_bp` specifies a TARGET STATE:
> > >
> > >     "Node N should contain X basis points (X/10000) of system memory"
> >
> > I'm not sure if this is exactly what we discussed off list.  What I expected
> > the goal would be based on our discussion is, what the first patch is saying.
> > To quote the part,
> >
> >     @@ -155,6 +155,7 @@ enum damos_action {
> >       * @DAMOS_QUOTA_NODE_MEM_FREE_BP:      MemFree ratio of a node.
> >       * @DAMOS_QUOTA_NODE_MEMCG_USED_BP:    MemUsed ratio of a node for a cgroup.
> >       * @DAMOS_QUOTA_NODE_MEMCG_FREE_BP:    MemFree ratio of a node for a cgroup.
> >     + * @DAMOS_QUOTA_NODE_SYS_BP:           Scheme-eligible bytes ratio of a node.
> >       * @NR_DAMOS_QUOTA_GOAL_METRICS:       Number of DAMOS quota goal metrics.
> >
> > That is, my understanding of what we want to achive with the new goal is,
> > letting users to ask DAMON "migrate hot pages of node A to node B, aiming
> > X % of node B becomes hot, as a result of the migrations".
> >
> 
> Yes, this is the intent. Looking back at my implementation, I see the
> mismatch:
> 
> 1. **Numerator**: Should count only scheme-eligible bytes.
> 
> 2. **Denominator**: Should use node capacity, but I used total system
>    memory.
> 
> > But your above description is not saying about the "scheme-eligibility".  Also,
> > the goal metric is the ratio of the memory to the entire system memory, not
> > just a given node.  My quick read of damon_pa_get_node_sys_bp() on the second
> > patch of this series confirms the implementation is following your description,
> > not what I imagined.
> >
> 
> You're right. The implementation diverged from what we discussed. I'll
> fix both the numerator and denominator in v2.

Thank you for clarifying it and sharing the next plan, all sounds good!

[...]
> > Please note that the goal-based quota auto-tuning works in proportional way,
> > preferring small steps and "eventual" goal convergence.  As a result, migration
> > will occur a few more times until it is completely stopped after the goal is
> > satisfied.  Unless there is another scheme that migrates pages into node 0, you
> > may end up having node 0 having a bit less than the 40% memory.
> >
> > >
> > >     No oscillation - migration stops when target state is reached.
> >
> > So, little bit of oscillation could still happen.  Hopefully that shouldn't be
> > significant, though.
> >
> 
> Yes, As we discussed offline, for a two-context approach:
> 
>       Context 0: monitors node 0, migrate_hot → node 1
>         goal: damos_target_mem_node_sys_bp, nid=1, target=4000
>         "Stop when node 1 is 40% hot"
> 
>       Context 1: monitors node 1, migrate_hot → node 0
>         goal: damos_target_mem_node_sys_bp, nid=0, target=6000
>         "Stop when node 0 is 60% hot"
> 
> Each context eventually stops when its target node reaches the desired
> threshold,
> and the reverse direction is handled by the other context. For my use
> case, eventual convergence with this setup could be acceptable.

Thank you for clarifying this!  I'm relieved we don't have a concern here.

> An immediate-stop feature could still be useful for the broader community.

Thank you for the feedback.  I will take more time on thinking about how to
implement this.

> Will test and post results after the next iteration.
> 
> > IIRC, SK hynix people also confused with the behavior when they experimented
> > migrate_{hot,cold} action with NODE_MEM_USED_BP goal based quota auto-tuning,
> > but using only a single scheme that does migration in a single direction.
> > Because this is at least second time it made confusion, if you need, maybe I
> > can try to add a feature for making DAMOS immediately stops after the goal is
> > satisfied.  Let me know if such new feature can be useful for you.  Cc-ing SK
> > hynix people (Honggyu and Yunjeong) so that they can correct me if my memory is
> > broken, or answer if the new feature I described here can be useful for them.
[...]
> > > Why get_goal_metric() Ops Callback
> > > ==================================
> > >
> > > The bp calculation requires iterating over monitored PA regions:
> > >
> > >     for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > >         if (page_to_nid(pfn_to_page(pfn)) == nid)
> > >             node_bytes += PAGE_SIZE;
> > >     }
> > >     bp = node_bytes * 10000 / system_total;
> > >
> > > This requires address-space knowledge that only the ops provider has.
> > > Existing goal metrics (PSI, node_mem_*, node_memcg_*) are computed in
> > > core using system-wide data that doesn't require iterating monitored
> > > regions.
> > >
> > > The new `get_goal_metric()` callback in `damon_operations` allows:
> > >
> > >   1. Core to remain generic - handles all common metrics
> >
> > I agree this is indeed making the design clean.  But, we already having such
> > exception, like core.c code using 'damon_target_has_pid().  Having just one
> > more exception seems ok to me, unless it makes code too ugly.
> >
> > >   2. Ops providers to implement metrics requiring region iteration
> > >   3. Clean separation - PA implements node_sys_bp, VA could add
> > >      different metrics in future
> >
> > I agree it could be useful for clean support of virtual address mode in future.
> > But, I'd prefer making this as simple and small as possible for the support we
> > will use at the moment.
> >
> > >   4. Optional implementation - ops return 0 if metric unsupported
> >
> > Again, letting core logic having a bit of ops layer information is not a big
> > problem to my humble perspective.
> >
> > So, I'd more prefer not adding the Ops callback, unless you have some other
> > concerns.
> >
> 
> Agreed. I'll remove the get_goal_metric() ops callback in v2

Thank you for flexibly accepting my suggestion.

[...]
> > > Advantages of PA-Based Migration
> > > ================================
> > >
> > > PA-based migration with DAMON enables integration of multiple hotness
> > > sources for migration decisions:
> > >
> > >   1. DAMON's native access pattern monitoring
> > >   2. Fault-based information (similar to NUMA Balancing)
> > >   3. Future: Hardware monitoring units (e.g., CXL CHMU)
> > >   4. Future: Instruction-Based Sampling (AMD IBS, Intel PEBS)
> > >
> > > Unlike VA-based approaches tied to individual process address spaces, PA
> > > monitoring can aggregate hotness information from multiple sources to make
> > > system-wide migration decisions across the entire physical memory space.
> >
> > Maybe you are saying about the damon_report_access() based DAMON extension
> > project [1]?  Since it is not yet upstreamed, and the long term plan is to
> > support not only physical address but also virtual address space, I think this
> > section is better to be removed, unless the DAMON extension project is merged
> > before this patch series.  I expect this patch series will be merged much
> > earlier than the extension project.
> >
> 
> Agreed. I'll remove the references to future hotness sources.

Thank you!

[...]
> > > Feedback on the overall approach and design is appreciated.
> >
> > I hope my above comments helps the forward progress of this nice series.
> >
> 
> They are very helpful. Thanks a lot.

The pleasure is mine, looking forward to the next version of this patch series
:)


Thanks,
SJ

[...]