lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9f9ce455-262a-4d55-829f-ff485f67dc7a@amd.com>
Date: Tue, 17 Jun 2025 13:58:13 +0530
From: Bharata B Rao <bharata@....com>
To: Matthew Wilcox <willy@...radead.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 Jonathan.Cameron@...wei.com, dave.hansen@...el.com, gourry@...rry.net,
 hannes@...xchg.org, mgorman@...hsingularity.net, mingo@...hat.com,
 peterz@...radead.org, raghavendra.kt@....com, riel@...riel.com,
 rientjes@...gle.com, sj@...nel.org, weixugc@...gle.com,
 ying.huang@...ux.alibaba.com, ziy@...dia.com, dave@...olabs.net,
 nifan.cxl@...il.com, xuezhengchu@...wei.com, yiannis@...corp.com,
 akpm@...ux-foundation.org, david@...hat.com
Subject: Re: page_ext and memdescs

On 16-Jun-25 7:35 PM, Matthew Wilcox wrote:
> On Mon, Jun 16, 2025 at 07:09:30PM +0530, Bharata B Rao wrote:
<snip>
>> +#define PAGE_EXT_MIG_NID_MASK	((1UL << PAGE_EXT_MIG_NID_SHIFT) - 1)
>> +#define PAGE_EXT_MIG_FREQ_MASK	((1UL << PAGE_EXT_MIG_FREQ_SHIFT) - 1)
>> +#define PAGE_EXT_MIG_TIME_MASK	((1UL << PAGE_EXT_MIG_TIME_SHIFT) - 1)
> 
> OK, so we need to have a conversation about page_ext.  Sorry this is
> happening to you.  I've kind of skipped over page_ext when talking
> about folios and memdescs up to now, so it's not that you've missed
> anything.
> 
> As the comment says,
> 
>   * Page Extension can be considered as an extended mem_map.
> 
> and we need to do this because we don't want to grow struct page beyond
> 64 bytes.  But memdescs are dynamically allocated, so we don't need
> page_ext any more, and all that code can go away.
> 
> lib/alloc_tag.c:struct page_ext_operations page_alloc_tagging_ops = {
> mm/page_ext.c:static struct page_ext_operations page_idle_ops __initdata = {
> mm/page_ext.c:static struct page_ext_operations *page_ext_ops[] __initdata = {
> mm/page_owner.c:struct page_ext_operations page_owner_ops = {
> mm/page_table_check.c:struct page_ext_operations page_table_check_ops = {
> 
> I think all of these are actually per-memdesc thangs and not per-page
> things, so we can get rid of them all.  That means I don't want to see
> new per-page data being added to page_ext.

Fair point.

> 
> So, what's this really used for?  It seems like it's really
> per-allocation, not per-page.  Does it need to be preserved across
> alloc/free or can it be reset at free time?

The context here is to track the pages that need to be migrated. Whether 
it is for NUMA Balancing or for any other sub-system that would need to 
migrate (or promote) pages across nodes, I am trying to come up with a 
kernel thread based migrator that would migrate the identified pages in 
an async and batched manner. For this, the basic information that is 
required for each such ready-to-be-migrated page is the target NID.
Since I have chosen to walk the zones and PFNs of the zone to iterate 
over each page, an additional info that I want per ready-to-be-migrated 
page is an indication that the page is indeed ready now to be migrated 
by the migrator thread.

In addition to these two things, if we want to carve out a single system 
(like kpromoted approach) that handles inputs from multiple page hotness 
sources, maintains heuristics to decide when exactly to migrate/promote 
a page, then it would be good to store a few other information for such 
pages (like access frequency, access timestamp etc).

With that background, I am looking for an optimal place to store this 
information. In my earlier approaches, I was maintaining a global list 
of such hot pages and realized that such an approach will not scale and 
hence in the current approach I am tying that information with the page 
itself. With that, there is no overhead of maintaining such a list, 
synchronizing between producers and migrator thread, no allocation for 
each maintained page. Hence it appeared to me that a pre-allocated 
per-page info would be preferable. At this point, page extension 
appeared a good place to have this information.

Sorry for the long reply, but coming to your specific question now.
So I really need to maintain such data only for pages that can be 
migrated. Pages like most anonymous pages, file backed pages, pages that 
are mapped to user page tables, THP pages etc are candidates. I wonder 
which memdesc type/types would cover all such pages. Would "folio" as 
memdesc (https://kernelnewbies.org/MatthewWilcox/FolioAlloc) be broad 
enough type for this?

As you note, it appears to me that it could be per-allocation rather 
than per-page and the information needn't be preserved across alloc/free.

Regards,
Bharata.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ