[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9f9a61f-6798-42f4-a09e-dcdf54e0a649@amd.com>
Date: Tue, 18 Mar 2025 16:15:31 +0530
From: Bharata B Rao <bharata@....com>
To: SeongJae Park <sj@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
AneeshKumar.KizhakeVeetil@....com, Hasan.Maruf@....com,
Jonathan.Cameron@...wei.com, Michael.Day@....com, akpm@...ux-foundation.org,
dave.hansen@...el.com, david@...hat.com, feng.tang@...el.com,
gourry@...rry.net, hannes@...xchg.org, honggyu.kim@...com, hughd@...gle.com,
jhubbard@...dia.com, k.shutemov@...il.com, kbusch@...a.com,
kmanaouil.dev@...il.com, leesuyeon0506@...il.com, leillc@...gle.com,
liam.howlett@...cle.com, mgorman@...hsingularity.net, mingo@...hat.com,
nadav.amit@...il.com, nphamcs@...il.com, peterz@...radead.org,
raghavendra.kt@....com, riel@...riel.com, rientjes@...gle.com,
rppt@...nel.org, shivankg@....com, shy828301@...il.com, vbabka@...e.cz,
weixugc@...gle.com, willy@...radead.org, ying.huang@...ux.alibaba.com,
ziy@...dia.com, dave@...olabs.net, yuanchu@...gle.com, hyeonggon.yoo@...com,
Harry Yoo <harry.yoo@...cle.com>
Subject: Re: [RFC PATCH 0/4] Kernel daemon for detecting and promoting hot
pages
Hi SJ,
Thanks for your detailed points and this surely sets up a good context
for discussion in LSFMM.
Please see my replies to a few of your questions below:
On 17-Mar-25 3:30 AM, SeongJae Park wrote:
>>
>> Currently I have added AMD IBS driver as one source that provides
>> page access information as an example. This driver feeds info to
>> kpromoted in this RFC patchset. More sources were discussed in a
>> similar context here at [1].
>
> I was imagining how I would be able to do this with DAMON via operations set
> layer interface. And I find thee current interface is not very optimized for
> AMD IBS like sources that catches the access on the line. That is, in a way,
> we could say AMD IBS like primitives as push-oriented, while page tables'
> accessed bits information are pull-oriented. DAMON operations set layer
> interface is easier to be used in pull-oriented case. I don't think it cannot
> be used for push-oriented case, but definitely the interface would better to be
> more optimized for the use case.
>
> I'm curious if you also tried doing this by extending DAMON, and if some hidden
> problems you found.
I remember discussing this with you during DAMON BoF in one of the
earlier LPC events, but I didn't get to try it. Guess now is the time :-)
I see the challenge with the current DAMON interfaces to integrate IBS
provided access info. If you check my IBS driver, I store the incoming
access info from IBS into per-cpu buffers before pushing them on to the
subsystem that act on them. I would think pull-based DAMON interfaces
can consume those buffered samples rather than IBS pushing samples into
DAMON. But I am yet to get clarity on how to honor the region based
sampling that is inherent to DAMON's functioning. May be only using
samples that are of interest to the region being tracked could be one way.
>
>>
>> This is just an early attempt to check what it takes to maintain
>> a single source of page hotness info and also separate hot page
>> detection mechanisms from the promotion mechanism. There are too
>> many open ends right now and I have listed a few of them below.
>>
>> - The API that is provided to register memory access expects
>> the PFN, NID and time of access at the minimum. This is
>> described more in patch 2/4. This API currently can be called
>> only from contexts that allow sleeping and hence this rules
>> out using it from PTE scanning paths. The API needs to be
>> more flexible with respect to this.
>> - Some sources like PTE A bit scanning can't provide the precise
>> time of access or the NID that is accessing the page. The latter
>> has been an open problem to which I haven't come across a good
>> and acceptable solution.
>
> Agree. PTE A bit scanning could be useful in many cases, but not every case.
> There was an RFC patchset[7] that extends DAMON for NID. I'm planning to do
> that again using DAMON operations layer interface. My current plan is to
> implement the prototype using prot_none page faults, and later extend for AMD
> IBS like h/w features. Hopefully I will share a prototype or at least more
> detailed idea on LSFMMBPF 2025.
>
>> - The way the hot page information is maintained is pretty
>> primitive right now. Ideally we would like to store hotness info
>> in such a way that it should be easily possible to lookup say N
>> most hot pages.
>
> DAMON provides a feature for lookup of N most hotpages, namely DAMOS quotas'
> access pattern based regions prioritization[5].
>
>> - If PTE A bit scanners are considered as hotness sources, we will
>> be bombarded with accesses. Do we want to accomodate all those
>> accesses or just go with hotness info for fixed number of pages
>> (possibly as a ratio of lower tier memory capacity)?
>
> I understand you're saying about memory space overhead. Correct me if I'm
> wrong, please.
Correct and also the overhead of managing so much data. What I see is
that if I start pushing all the access info obtained from LRU pgtable
scanning, kpromoted would end up spending a lot of time in operations
like lookup, walking the list of hot pages etc.
So may be it would be better to do some sort of early processing and/or
filtering at the hotness source level itself before letting
kpromoted-like subsystems to do further tracking and action.
>
> Isn't same issue exists for current implementation of the sampling frequency is
> high, and/or aggregation window is long?
>
> To me, hence, this looks like not a problem of the information source, but how
> to maintain the information. Current implementation maintains it per page, so
> I think the problem is inherent.
Well yes, but we the goal could be do better than NUMAB=2 which does
per-page level tracking.
>
> DAMON maintains the information in region abstraction that can save multiple
> pages with one data structure. The maximum number of regions can be set by
> users, so the space overhead can be controlled.
The granularity of tracking - per-page vs range/region is a topic of
discussion I suppose.
Regards,
Bharata.
Powered by blists - more mailing lists