linux-kernel - Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250623134550.2367733-1-joshua.hahnjy@gmail.com>
Date: Mon, 23 Jun 2025 06:45:49 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Bijan Tabatabai <bijan311@...il.com>
Cc: damon@...ts.linux.dev,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	sj@...nel.org,
	akpm@...ux-foundation.org,
	david@...hat.com,
	ziy@...dia.com,
	matthew.brost@...el.com,
	joshua.hahnjy@...il.com,
	rakie.kim@...com,
	byungchul@...com,
	gourry@...rry.net,
	ying.huang@...ux.alibaba.com,
	apopple@...dia.com,
	bijantabatab@...ron.com,
	venkataravis@...ron.com,
	emirakhur@...ron.com,
	ajayjoshi@...ron.com,
	vtavarespetr@...ron.com
Subject: Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions

On Fri, 20 Jun 2025 13:04:56 -0500 Bijan Tabatabai <bijan311@...il.com> wrote:

Hi Bijan,

I hope you are doing well! Sorry for the late response. It seems like SJ
already gave some great feedback already though, so I will just chime in
with my 2c.

[...snip...]

> However, currently the interleave weights only are applied when data is
> allocated. Migrating already allocated pages according to the dynamically
> changing weights will better help balance the bandwidth utilization across
> nodes.
> 
> As a toy example, imagine some application that uses 75% of the local
> bandwidth. Assuming sufficient capacity, when running alone, we want to
> keep that application's data in local memory. However, if a second
> instance of that application begins, using the same amount of bandwidth,
> it would be best to interleave the data of both processes to alleviate the
> bandwidth pressure from the local node. Likewise, when one of the processes
> ends, the data should be moves back to local memory.

I think the addition of this example helps illustrate the neccesity for
interleaving, thank you for adding it in!

> We imagine there would be a userspace application that would monitor system
> performance characteristics, such as bandwidth utilization or memory access
> latency, and uses that information to tune the interleave weights. Others
> seem to have come to a similar conclusion in previous discussions [3].
>
> Functionality Test
> ==================

[...snip...]

> Performance Test
> ================
> Below is a simple example showing that interleaving application data using
> these patches can improve application performance.
> To do this, we run a bandwidth intensive embedding reduction application
> [5]. This workload is useful for this test because it reports the time it
> takes each iteration to run and reuses its buffers between allocation,
> allowing us to clearly see the benefits of the migration.
> 
> We evaluate this a 128 core/256 thread AMD CPU, with 72 GB/s of local DDR
> bandwidth and 26 GB/s of CXL memory.
> 
> Before we start the workload, the system bandwidth utilization is low, so
> we start with interleave weights biased as much as possible to the local
> node. When the workload begins, it saturates the local bandwidth, making
> the page placement suboptimal. To alleviate this, we modify the interleave
> weights, triggering DAMON to migrate the workload's data.
> 
>   $ cd /sys/kernel/mm/damon/admin/kdamonds/0/
>   $ sudo cat ./contexts/0/schemes/0/action
>   migrate_hot
>   $ sudo cat ./contexts/0/schemes/0/target_nid
>   0-1
>   $ echo 255 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0
>   $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node1
>   $ <path>/eval_baseline -d amazon_All -c 255 -r 100
>   <clip startup output>
>   Eval Phase 3: Running Baseline...
> 
>   REPEAT # 0 Baseline Total time : 9043.24 ms
>   REPEAT # 1 Baseline Total time : 7307.71 ms
>   REPEAT # 2 Baseline Total time : 7301.4 ms
>   REPEAT # 3 Baseline Total time : 7312.44 ms
>   REPEAT # 4 Baseline Total time : 7282.43 ms
>   # Interleave weights changed to 3:1
>   REPEAT # 5 Baseline Total time : 6754.78 ms
>   REPEAT # 6 Baseline Total time : 5322.38 ms
>   REPEAT # 7 Baseline Total time : 5359.89 ms
>   REPEAT # 8 Baseline Total time : 5346.14 ms
>   REPEAT # 9 Baseline Total time : 5321.98 ms
> 
> Updating the interleave weights, and having DAMON migrate the workload
> data according to the weights resulted in an approximately 25% speedup.

Thank you for sharing these very impressive results! So if I can understand
correctly, this workload allocates once (mostly), and each iteration just
re-uses the same allocation, meaning the effects of the weighted interleave
change are isolated mostly to the migration portion.

Based on that understanding, I'm wondering if a longer benchmark would help
demonstrate the effects of this patch a bit better. That is, IIRC short-lived
workloads should see most of its benefits come from correct allocation,
while longer-lived workloads should see most of its benefits come from
correct migration policies. I don't have a good idea of what the threshold
is for characterizing short vs. long workloads, but I think this could be
another prospective test you can use to demonstrate the gains of your patch.

One last thing that I wanted to note is that it seems like iteration 5, where
I imagine there is some additional work needed to balance the page placement
from 255:0 to 3:1 *still* outperforms the normal case in the original
benchmark. Really awesome!!!
 
> Questions for Reviewers
> =======================
> 1. Are you happy with the changes to the DAMON sysfs interface?
> 2. Setting an interleave weight to 0 is currently not allowed. This makes
>    sense when the weights are only used for allocation. Does it make sense
>    to allow 0 weights now?

If the goal of 0 weights is to prevent migration to that node, I think that
we should try to re-use existing mechanisms. There was actually quite a bit
of discussion on whether 0 weights should be allowed (the entire converstaion
was split across multiple versions, but I think this is the first instance [1]).

How about using nodemasks instead? I think that they serve a more explicit
purpose of preventing certain nodes from being used. Please let me know if
I'm missing something as to why we cannot use nodemasks here : -) 

[...snip...]

One last thing that I wanted to note -- given that these weights now serve
a dual purpose of setting allocation & migration weights, does it make sense
to update the weighted interleave documentation with this information as well?
Or, since it really only affects DAMON users, should we be ok with leaving it
out?

My preference is that we include it in weighted interleave documentation
(Documentation/ABI/testing/sysfs-kernel-mm-mempolikcy-weighted-interleave)
so that anyone who edits weighted interleave code in the future will at least
be aware that the changes they make will have effects in other subsystems.

Thank you for sharing these results! I hope you have a great day :):)
Joshua

[1] https://lore.kernel.org/all/87msfkh1ls.fsf@DESKTOP-5N7EMDA/

Sent using hkml (https://github.com/sjp38/hackermail)