[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9f9ce455-262a-4d55-829f-ff485f67dc7a@amd.com>
Date: Tue, 17 Jun 2025 13:58:13 +0530
From: Bharata B Rao <bharata@....com>
To: Matthew Wilcox <willy@...radead.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Jonathan.Cameron@...wei.com, dave.hansen@...el.com, gourry@...rry.net,
hannes@...xchg.org, mgorman@...hsingularity.net, mingo@...hat.com,
peterz@...radead.org, raghavendra.kt@....com, riel@...riel.com,
rientjes@...gle.com, sj@...nel.org, weixugc@...gle.com,
ying.huang@...ux.alibaba.com, ziy@...dia.com, dave@...olabs.net,
nifan.cxl@...il.com, xuezhengchu@...wei.com, yiannis@...corp.com,
akpm@...ux-foundation.org, david@...hat.com
Subject: Re: page_ext and memdescs
On 16-Jun-25 7:35 PM, Matthew Wilcox wrote:
> On Mon, Jun 16, 2025 at 07:09:30PM +0530, Bharata B Rao wrote:
<snip>
>> +#define PAGE_EXT_MIG_NID_MASK ((1UL << PAGE_EXT_MIG_NID_SHIFT) - 1)
>> +#define PAGE_EXT_MIG_FREQ_MASK ((1UL << PAGE_EXT_MIG_FREQ_SHIFT) - 1)
>> +#define PAGE_EXT_MIG_TIME_MASK ((1UL << PAGE_EXT_MIG_TIME_SHIFT) - 1)
>
> OK, so we need to have a conversation about page_ext. Sorry this is
> happening to you. I've kind of skipped over page_ext when talking
> about folios and memdescs up to now, so it's not that you've missed
> anything.
>
> As the comment says,
>
> * Page Extension can be considered as an extended mem_map.
>
> and we need to do this because we don't want to grow struct page beyond
> 64 bytes. But memdescs are dynamically allocated, so we don't need
> page_ext any more, and all that code can go away.
>
> lib/alloc_tag.c:struct page_ext_operations page_alloc_tagging_ops = {
> mm/page_ext.c:static struct page_ext_operations page_idle_ops __initdata = {
> mm/page_ext.c:static struct page_ext_operations *page_ext_ops[] __initdata = {
> mm/page_owner.c:struct page_ext_operations page_owner_ops = {
> mm/page_table_check.c:struct page_ext_operations page_table_check_ops = {
>
> I think all of these are actually per-memdesc thangs and not per-page
> things, so we can get rid of them all. That means I don't want to see
> new per-page data being added to page_ext.
Fair point.
>
> So, what's this really used for? It seems like it's really
> per-allocation, not per-page. Does it need to be preserved across
> alloc/free or can it be reset at free time?
The context here is to track the pages that need to be migrated. Whether
it is for NUMA Balancing or for any other sub-system that would need to
migrate (or promote) pages across nodes, I am trying to come up with a
kernel thread based migrator that would migrate the identified pages in
an async and batched manner. For this, the basic information that is
required for each such ready-to-be-migrated page is the target NID.
Since I have chosen to walk the zones and PFNs of the zone to iterate
over each page, an additional info that I want per ready-to-be-migrated
page is an indication that the page is indeed ready now to be migrated
by the migrator thread.
In addition to these two things, if we want to carve out a single system
(like kpromoted approach) that handles inputs from multiple page hotness
sources, maintains heuristics to decide when exactly to migrate/promote
a page, then it would be good to store a few other information for such
pages (like access frequency, access timestamp etc).
With that background, I am looking for an optimal place to store this
information. In my earlier approaches, I was maintaining a global list
of such hot pages and realized that such an approach will not scale and
hence in the current approach I am tying that information with the page
itself. With that, there is no overhead of maintaining such a list,
synchronizing between producers and migrator thread, no allocation for
each maintained page. Hence it appeared to me that a pre-allocated
per-page info would be preferable. At this point, page extension
appeared a good place to have this information.
Sorry for the long reply, but coming to your specific question now.
So I really need to maintain such data only for pages that can be
migrated. Pages like most anonymous pages, file backed pages, pages that
are mapped to user page tables, THP pages etc are candidates. I wonder
which memdesc type/types would cover all such pages. Would "folio" as
memdesc (https://kernelnewbies.org/MatthewWilcox/FolioAlloc) be broad
enough type for this?
As you note, it appears to me that it could be per-allocation rather
than per-page and the information needn't be preserved across alloc/free.
Regards,
Bharata.
Powered by blists - more mailing lists