[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cb9945fc-6f8e-4054-8116-d3f78e10dcb6@amd.com>
Date: Mon, 28 Jul 2025 11:46:25 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: Bijan Tabatabai <bijan311@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, damon@...ts.linux.dev
Cc: sj@...nel.org, akpm@...ux-foundation.org,
Bijan Tabatabai <bijantabatab@...ron.com>
Subject: Re: [PATCH] mm/damon/vaddr: Skip isolating folios already in
destination nid
On 7/25/2025 10:03 PM, Bijan Tabatabai wrote:
> From: Bijan Tabatabai <bijantabatab@...ron.com>
>
> damos_va_migrate_dests_add() determines the node a folio should be in
> based on the struct damos_migrate_dests associated with the migration
> scheme and adds the folio to the linked list corresponding to that node
> so it can be migrated later. Currently, folios are isolated and added to
> the list even if they are already in the node they should be in.
>
> In using damon weighted interleave more, I've found that the overhead of
> needlessly adding these folios to the migration lists can be quite
> high. The overhead comes from isolating folios and placing them in the
> migration lists inside of damos_va_migrate_dests_add(), as well as the
> cost of handling those folios in damon_migrate_pages(). This patch
> eliminates that overhead by simply avoiding the addition of folios that
> are already in their intended location to the migration list.
>
> To show the benefit of this patch, we start the test workload and start
> a DAMON instance attached to that workload with a migrate_hot scheme
> that has one dest field sending data to the local node. This way, we are
> only measuring the overheads of the scheme, and not the cost of migrating
> pages, since data will be allocated to the local node by default.
> I tested with two workloads: the embedding reduction workload used in [1]
> and a microbenchmark that allocates 20GB of data then sleeps, which is
> similar to the memory usage of the embedding reduction workload.
>
> The time taken in damos_va_migrate_dests_add() and damon_migrate_pages()
> each aggregation interval is shown below.
>
> Before this patch:
> damos_va_migrate_dests_add damon_migrate_pages
> microbenchmark ~2ms ~3ms
> embedding reduction ~1s ~3s
>
> After this patch:
> damos_va_migrate_dests_add damon_migrate_pages
> microbenchmark 0us ~40us
> embedding reduction 0us ~100us
>
> I did not do an in depth analysis for why things are much slower in the
> embedding reduction workload than the microbenchmark. However, I assume
> it's because the embedding reduction workload oversaturates the
> bandwidth of the local memory node, increasing the memory access
> latency, and in turn making the pointer chasing involved in iterating
> through a linked list much slower.
> Regardless of that, this patch results in a significant speedup.
>
> [1] https://lore.kernel.org/damon/20250709005952.17776-1-bijan311@gmail.com/
>
> Signed-off-by: Bijan Tabatabai <bijantabatab@...ron.com>
> ---
> Sorry I missed this in the original patchset!
>
> mm/damon/vaddr.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
> index 7f5dc9c221a0..4404c2ab0583 100644
> --- a/mm/damon/vaddr.c
> +++ b/mm/damon/vaddr.c
> @@ -711,6 +711,10 @@ static void damos_va_migrate_dests_add(struct folio *folio,
> target -= dests->weight_arr[i];
> }
>
> + /* If the folio is already in the right node, don't do anything */
> + if (folio_nid(folio) == dests->node_id_arr[i])
> + return;
> +
I have seen good improvements with similar changes in PTE A bit scan
based patches.
Feel free to add
Reviewed-by: Raghavendra K T <raghavendra.kt@....com>
> isolate:
> if (!folio_isolate_lru(folio))
> return;
Powered by blists - more mailing lists