[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0a9914f8-bec7-4e58-ab12-a87fe3876187@sk.com>
Date: Fri, 24 Jan 2025 14:53:08 +0900
From: Hyeonggon Yoo <hyeonggon.yoo@...com>
To: Raghavendra K T <raghavendra.kt@....com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"lsf-pc@...ts.linux-foundation.org" <lsf-pc@...ts.linux-foundation.org>,
"bharata@....com" <bharata@....com>
Cc: kernel_team@...ynix.com, 42.hyeyoo@...il.com,
"gourry@...rry.net" <gourry@...rry.net>,
"nehagholkar@...a.com" <nehagholkar@...a.com>,
"abhishekd@...a.com" <abhishekd@...a.com>,
"ying.huang@...ux.alibaba.com" <ying.huang@...ux.alibaba.com>,
"nphamcs@...il.com" <nphamcs@...il.com>,
"hannes@...xchg.org" <hannes@...xchg.org>,
"feng.tang@...el.com" <feng.tang@...el.com>,
"kbusch@...a.com" <kbusch@...a.com>,
"Hasan.Maruf@....com" <Hasan.Maruf@....com>, "sj@...nel.org"
<sj@...nel.org>, "david@...hat.com" <david@...hat.com>,
"willy@...radead.org" <willy@...radead.org>,
"k.shutemov@...il.com" <k.shutemov@...il.com>,
"mgorman@...hsingularity.net" <mgorman@...hsingularity.net>,
"vbabka@...e.cz" <vbabka@...e.cz>, "hughd@...gle.com" <hughd@...gle.com>,
"rientjes@...gle.com" <rientjes@...gle.com>,
"shy828301@...il.com" <shy828301@...il.com>,
"liam.howlett@...cle.com" <liam.howlett@...cle.com>,
"peterz@...radead.org" <peterz@...radead.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"nadav.amit@...il.com" <nadav.amit@...il.com>,
"shivankg@....com" <shivankg@....com>, "ziy@...dia.com" <ziy@...dia.com>,
"jhubbard@...dia.com" <jhubbard@...dia.com>,
"AneeshKumar.KizhakeVeetil@....com" <AneeshKumar.KizhakeVeetil@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jon.grimm@....com" <jon.grimm@....com>,
"santosh.shukla@....com" <santosh.shukla@....com>,
"Michael.Day@....com" <Michael.Day@....com>,
"riel@...riel.com" <riel@...riel.com>,
"weixugc@...gle.com" <weixugc@...gle.com>,
"leesuyeon0506@...il.com" <leesuyeon0506@...il.com>, honggyu.kim@...com,
"leillc@...gle.com" <leillc@...gle.com>,
"kmanaouil.dev@...il.com" <kmanaouil.dev@...il.com>,
"rppt@...nel.org" <rppt@...nel.org>,
"dave.hansen@...el.com" <dave.hansen@...el.com>, yuanchu@...gle.com
Subject: Re: [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion
based on PTE A bit scanning
On 1/23/2025 7:57 PM, Raghavendra K T wrote:
> Bharata and I would like to propose the following topic for LSFMM.
>
> Topic: Overhauling hot page detection and promotion based on PTE A bit scanning.
>
> In the Linux kernel, hot page information can potentially be obtained from
> multiple sources:
>
> a. PROT_NONE faults (NUMA balancing)
> b. PTE Access bit (LRU scanning)
> c. Hardware provided page hotness info (like AMD IBS)
>
> This information is further used to migrate (or promote) pages from slow memory
> tier to top tier to increase performance.
>
> In the current hot page promotion mechanism, all the activities including the
> process address space scanning, NUMA hint fault handling and page migration are
> performed in the process context. i.e., scanning overhead is borne by the
> applications.
>
> I had recently posted a patch [1] to improve this in the context of slow-tier
> page promotion. Here, Scanning is done by a global kernel thread which routinely
> scans all the processes' address spaces and checks for accesses by reading the
> PTE A bit. The hot pages thus identified are maintained in list and subsequently> are promoted to a default top-tier node. Thus, the approach pushes overhead of
> scanning, NUMA hint faults and migrations off from process context.
>
> The topic was presented in the MM alignment session hosted by David Rientjes [2].
> The topic also finds a mention in S J Park's LSFMM proposal [3].
>
> Here is the list of potential discussion points:
> 1. Other improvements and enhancements to PTE A bit scanning approach. Use of
> multiple kernel threads, throttling improvements, promotion policies, per-process
> opt-in via prctl, virtual vs physical address based scanning, tuning hot page
> detection algorithm etc.
Yuanchu's MGLRU periodic aging series [1] seems quite relevant here,
you might want to look at it. adding Yuanchu to Cc.
By the way, do you have any reason why you'd prefer opt-in prctl
over per-memcg control?
[1] https://lore.kernel.org/all/20221214225123.2770216-1-yuanchu@google.com/
> 2. Possibility of maintaining single source of truth for page hotness that would
> maintain hot page information from multiple sources and let other sub-systems
> use that info.
>
> 3. Discuss how hardware provided hotness info (like AMD IBS) can further aid
> promotion. Bharata had posted an RFC [4] on this a while back.
>
> 4. Overlap with DAMON and potential reuse.
>
> Links:
>
> [1] https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@amd.com/
> [2] https://lore.kernel.org/linux-mm/20241226012833.rmmbkws4wdhzdht6@ed.ac.uk/T/
> [3] https://lore.kernel.org/lkml/Z4XUoWlU-UgRik18@gourry-fedora-PF4VCD3F/T/
> [4] https://lore.kernel.org/lkml/20230208073533.715-2-bharata@amd.com/
>
>
Powered by blists - more mailing lists