[<prev] [next>] [day] [month] [year] [list]
Message-ID: <584d3ace-ca64-424d-b8ce-c2cd54cec8a6@amd.com>
Date: Sun, 23 Mar 2025 23:44:02 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: Hillf Danton <hdanton@...a.com>
Cc: dave.hansen@...el.com, david@...hat.com, hannes@...xchg.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, ziy@...dia.com
Subject: Re: [RFC PATCH V1 09/13] mm: Add heuristic to calculate target node
On 3/21/2025 4:23 PM, Hillf Danton wrote:
> On Wed, 19 Mar 2025 19:30:24 +0000 Raghavendra K T wrote
>> One of the key challenges in PTE A bit based scanning is to find right
>> target node to promote to.
>>
>> Here is a simple heuristic based approach:
>> While scanning pages of any mm we also scan toptier pages that belong
>> to that mm. We get an insight on the distribution of pages that potentially
>> belonging to particular toptier node and also its recent access.
>>
>> Current logic walks all the toptier node, and picks the one with highest
>> accesses.
>>
> My $.02 for selecting promotion target node given a simple multi tier system.
>
> Tk /* top Tierk (k > 0) has K (K > 0) nodes */
> ...
> Tj /* Tierj (j > 0) has J (J > 0) nodes */
> ...
> T0 /* bottom Tier0 has O (O > 0) nodes */
>
> Unless config comes from user space (sysfs window for example should be opened),
>
> 1, adopt the data flow pattern of L3 cache <--> DRAM <--> SSD, to only
> select Tj+1 when promoting pages in Tj.
>
Hello Hillf ,
Thanks for giving a thought on this. This looks to be good idea in
general. Mostly be able to implement with reverse of preferred demotion
target?
Thinking loud, Can there be exception cases similar to non-temporal copy
operations, where we don't want to pollute cache?
I mean cases we don't want to hop via middle tier node..?
> 2, select the node in Tj+1 that has the most free pages for promotion
> by default.
Not sure if this is productive always.
for e.g.
node 0-1 toptier (100GB)
node2 slowtier
suppose a workload (that occupies 80GB in total) running on CPU of node1
where 40GB is already in node1 rest of 40GB is in node2.
Now it is preferred to consolidate workload on node1 when slowtier
data becomes hot?
(This assumes that node1 channel has enough bandwidth to cater to
requirement of the workload)
> 3, nothing more.
Powered by blists - more mailing lists