linux-kernel - Re: [PATCH] mm/damon/vaddr: Skip isolating folios already in destination nid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <cb9945fc-6f8e-4054-8116-d3f78e10dcb6@amd.com>
Date: Mon, 28 Jul 2025 11:46:25 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: Bijan Tabatabai <bijan311@...il.com>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, damon@...ts.linux.dev
Cc: sj@...nel.org, akpm@...ux-foundation.org,
 Bijan Tabatabai <bijantabatab@...ron.com>
Subject: Re: [PATCH] mm/damon/vaddr: Skip isolating folios already in
 destination nid

On 7/25/2025 10:03 PM, Bijan Tabatabai wrote:
> From: Bijan Tabatabai <bijantabatab@...ron.com>
> 
> damos_va_migrate_dests_add() determines the node a folio should be in
> based on the struct damos_migrate_dests associated with the migration
> scheme and adds the folio to the linked list corresponding to that node
> so it can be migrated later. Currently, folios are isolated and added to
> the list even if they are already in the node they should be in.
> 
> In using damon weighted interleave more, I've found that the overhead of
> needlessly adding these folios to the migration lists can be quite
> high. The overhead comes from isolating folios and placing them in the
> migration lists inside of damos_va_migrate_dests_add(), as well as the
> cost of handling those folios in damon_migrate_pages(). This patch
> eliminates that overhead by simply avoiding the addition of folios that
> are already in their intended location to the migration list.
> 
> To show the benefit of this patch, we start the test workload and start
> a DAMON instance attached to that workload with a migrate_hot scheme
> that has one dest field sending data to the local node. This way, we are
> only measuring the overheads of the scheme, and not the cost of migrating
> pages, since data will be allocated to the local node by default.
> I tested with two workloads: the embedding reduction workload used in [1]
> and a microbenchmark that allocates 20GB of data then sleeps, which is
> similar to the memory usage of the embedding reduction workload.
> 
> The time taken in damos_va_migrate_dests_add() and damon_migrate_pages()
> each aggregation interval is shown below.
> 
> Before this patch:
>                         damos_va_migrate_dests_add damon_migrate_pages
> microbenchmark                   ~2ms                      ~3ms
> embedding reduction              ~1s                       ~3s
> 
> After this patch:
>                         damos_va_migrate_dests_add damon_migrate_pages
> microbenchmark                    0us                      ~40us
> embedding reduction               0us                      ~100us
> 
> I did not do an in depth analysis for why things are much slower in the
> embedding reduction workload than the microbenchmark. However, I assume
> it's because the embedding reduction workload oversaturates the
> bandwidth of the local memory node, increasing the memory access
> latency, and in turn making the pointer chasing involved in iterating
> through a linked list much slower.
> Regardless of that, this patch results in a significant speedup.
> 
> [1] https://lore.kernel.org/damon/20250709005952.17776-1-bijan311@gmail.com/
> 
> Signed-off-by: Bijan Tabatabai <bijantabatab@...ron.com>
> ---
> Sorry I missed this in the original patchset!
> 
>   mm/damon/vaddr.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
> index 7f5dc9c221a0..4404c2ab0583 100644
> --- a/mm/damon/vaddr.c
> +++ b/mm/damon/vaddr.c
> @@ -711,6 +711,10 @@ static void damos_va_migrate_dests_add(struct folio *folio,
>   		target -= dests->weight_arr[i];
>   	}
>   
> +	/* If the folio is already in the right node, don't do anything */
> +	if (folio_nid(folio) == dests->node_id_arr[i])
> +		return;
> +

I have seen good improvements with similar changes in PTE A bit scan
based patches.

Feel free to add

Reviewed-by: Raghavendra K T <raghavendra.kt@....com>

>   isolate:
>   	if (!folio_isolate_lru(folio))
>   		return;