[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <84c8de4c-cf2b-4b5d-b1e2-952d52f42fd4@amd.com>
Date: Fri, 20 Dec 2024 12:00:09 +0530
From: Raghavendra K T <raghavendra.kt@....com>
To: SeongJae Park <sj@...nel.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, gourry@...rry.net,
nehagholkar@...a.com, abhishekd@...a.com, david@...hat.com,
ying.huang@...el.com, nphamcs@...il.com, akpm@...ux-foundation.org,
hannes@...xchg.org, feng.tang@...el.com, kbusch@...a.com, bharata@....com,
Hasan.Maruf@....com, willy@...radead.org, kirill.shutemov@...ux.intel.com,
mgorman@...hsingularity.net, vbabka@...e.cz, hughd@...gle.com,
rientjes@...gle.com, shy828301@...il.com, Liam.Howlett@...cle.com,
peterz@...radead.org, mingo@...hat.com
Subject: Re: [RFC PATCH V0 0/10] mm: slowtier page promotion based on PTE A
bit
On 12/11/2024 12:23 AM, SeongJae Park wrote:
> Hello Raghavendra,
>
>
> Thank you for posting this nice patch series. I gave you some feedback
> offline. Adding those here again for transparency on this grateful public
> discussion.
>
> On Sun, 1 Dec 2024 15:38:08 +0000 Raghavendra K T <raghavendra.kt@....com> wrote:
>
>> Introduction:
>> =============
>> This patchset is an outcome of an ongoing collaboration between AMD and Meta.
>> Meta wanted to explore an alternative page promotion technique as they
>> observe high latency spikes in their workloads that access CXL memory.
>>
>> In the current hot page promotion, all the activities including the
>> process address space scanning, NUMA hint fault handling and page
>> migration is performed in the process context. i.e., scanning overhead is
>> borne by applications.
>
> Yet another approach is using DAMON. DAMON does access monitoring, and further
> allows users to request access pattern-driven system operations in name of
> DAMOS (Data Access Monitoring-based Operation Schemes). Using it, users can
> request DAMON to find hot pages and promote, while finding cold pages and
> demote. SK hynix has made their CXL-based memory capacity expansion solution
> in the way (https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion). We
> collaboratively developed new DAMON features for that, and those are all
> in the mainline since Linux v6.11.
> > I also proposed an idea for advancing it using DAMOS auto-tuning on more
> general (>2 tiers) setup
> (https:lore.kernel.org/20231112195602.61525-1-sj@...nel.org). I haven't had a
> time to further implement and test the idea so far, though.
>
>>
>> This is an early RFC patch series to do (slow tier) CXL page promotion.
>> The approach in this patchset assists/addresses the issue by adding PTE
>> Accessed bit scanning.
>>
>> Scanning is done by a global kernel thread which routinely scans all
>> the processes' address spaces and checks for accesses by reading the
>> PTE A bit. It then migrates/promotes the pages to the toptier node
>> (node 0 in the current approach).
>>
>> Thus, the approach pushes overhead of scanning, NUMA hint faults and
>> migrations off from process context.
>
> DAMON also uses PTE A bit as major source of the access information. And DAMON
> does both access monitoring and promotion/demotion in a global kernel thread,
> namely kdamond. Hence the DAMON-based approach would also offload the
> overheads from process context. So I feel your approach has a sort of
> similarity with DAMON-based one in a way, and we might have a chance to avoid
> unnecessary duplicates.
>
> [...]
>>
>> Limitations:
>> ===========
>> PTE A bit scanning approach lacks information about exact destination
>> node to migrate to.
>
> This is same for DAMON-based approach, since DAMON also uses PTE A bit as the
> major source of the information. We aim to extend DAMON to aware of the access
> source CPU, and use it for solving this problem, though. Utilizing page faults
> or AMD IBS-like h/w features are on the table of the ideas.
>
>>
>> Notes/Observations on design/Implementations/Alternatives/TODOs...
>> ================================
>> 1. Fine-tuning scan throttling
>
> DAMON allows users set the upper-limit of monitoring overhead, using
> max_nr_regions parameter. Then it provides its best-effort accuracy. We also
> have ongoing projects for making it more accurate and easier to tune.
>
>>
>> 2. Use migrate_balanced_pgdat() to balance toptier node before migration
>> OR Use migrate_misplaced_folio_prepare() directly.
>> But it may need some optimizations (for e.g., invoke occasionaly so
>> that overhead is not there for every migration).
>>
>> 3. Explore if a separate PAGE_EXT flag is needed instead of reusing
>> PAGE_IDLE flag (cons: complicates PTE A bit handling in the system),
>> But practically does not look good idea.
>>
>> 4. Use timestamp information-based migration (Similar to numab mode=2).
>> instead of migrating immediately when PTE A bit set.
>> (cons:
>> - It will not be accurate since it is done outside of process
>> context.
>> - Performance benefit may be lost.)
>
> DAMON provides a sort of time-based aggregated monitoring results. And DAMOS
> provides prioritization of pages based on the access temperature. Hence,
> DAMON-based apparoach can also be used for a similar purpose (promoting not
> every accessed pages but pages that more frequently used for longer time).
>
>>
>> 5. Explore if we need to use PFN information + hash list instead of
>> simple migration list. Here scanning is directly done with PFN belonging
>> to CXL node.
>
> DAMON supports physical address space monitoring, and maintains the access
> monitoring results in its own data structure called damon_region. So I think
> similar benefit can be achieved using DAMON?
>
> [...]
>> 8. Using DAMON APIs OR Reusing part of DAMON which already tracks range of
>> physical addresses accessed.
>
> My biased humble opinion is that it would be very nice to explore this
> opportunity, since I show some similarities and opportunities to solve some of
> challenges on your approach in an easier way. Even if it turns out that DAMON
> cannot be used for your use case, failing earlier is a good thing, I'd say :)
>
>>
>> 9. Gregory has nicely mentioned some details/ideas on different approaches in
>> [1] : development notes, in the context of promoting unmapped page cache folios.
>
> DAMON supports monitoring accesses to unmapped page cache folios, so hopefully
> DAMON-based approaches can also solve this issue.
>
Hello SJ,
Thank you for detailed explanation again. (Sorry for late
acknowledgement as I was looking forward to MM alignment discussion when
this message came).
I think once the direction is fixed, we could surely use / Reuse lot
source code from DAMON, MGLRU. Amazing design of DAMON should surely
help. Will keep in mind all the points raised here.
Thanks and Regards
- Raghu
Powered by blists - more mailing lists