lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0a9914f8-bec7-4e58-ab12-a87fe3876187@sk.com>
Date: Fri, 24 Jan 2025 14:53:08 +0900
From: Hyeonggon Yoo <hyeonggon.yoo@...com>
To: Raghavendra K T <raghavendra.kt@....com>,
 "linux-mm@...ck.org" <linux-mm@...ck.org>,
 "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
 "lsf-pc@...ts.linux-foundation.org" <lsf-pc@...ts.linux-foundation.org>,
 "bharata@....com" <bharata@....com>
Cc: kernel_team@...ynix.com, 42.hyeyoo@...il.com,
 "gourry@...rry.net" <gourry@...rry.net>,
 "nehagholkar@...a.com" <nehagholkar@...a.com>,
 "abhishekd@...a.com" <abhishekd@...a.com>,
 "ying.huang@...ux.alibaba.com" <ying.huang@...ux.alibaba.com>,
 "nphamcs@...il.com" <nphamcs@...il.com>,
 "hannes@...xchg.org" <hannes@...xchg.org>,
 "feng.tang@...el.com" <feng.tang@...el.com>,
 "kbusch@...a.com" <kbusch@...a.com>,
 "Hasan.Maruf@....com" <Hasan.Maruf@....com>, "sj@...nel.org"
 <sj@...nel.org>, "david@...hat.com" <david@...hat.com>,
 "willy@...radead.org" <willy@...radead.org>,
 "k.shutemov@...il.com" <k.shutemov@...il.com>,
 "mgorman@...hsingularity.net" <mgorman@...hsingularity.net>,
 "vbabka@...e.cz" <vbabka@...e.cz>, "hughd@...gle.com" <hughd@...gle.com>,
 "rientjes@...gle.com" <rientjes@...gle.com>,
 "shy828301@...il.com" <shy828301@...il.com>,
 "liam.howlett@...cle.com" <liam.howlett@...cle.com>,
 "peterz@...radead.org" <peterz@...radead.org>,
 "mingo@...hat.com" <mingo@...hat.com>,
 "nadav.amit@...il.com" <nadav.amit@...il.com>,
 "shivankg@....com" <shivankg@....com>, "ziy@...dia.com" <ziy@...dia.com>,
 "jhubbard@...dia.com" <jhubbard@...dia.com>,
 "AneeshKumar.KizhakeVeetil@....com" <AneeshKumar.KizhakeVeetil@....com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "jon.grimm@....com" <jon.grimm@....com>,
 "santosh.shukla@....com" <santosh.shukla@....com>,
 "Michael.Day@....com" <Michael.Day@....com>,
 "riel@...riel.com" <riel@...riel.com>,
 "weixugc@...gle.com" <weixugc@...gle.com>,
 "leesuyeon0506@...il.com" <leesuyeon0506@...il.com>, honggyu.kim@...com,
 "leillc@...gle.com" <leillc@...gle.com>,
 "kmanaouil.dev@...il.com" <kmanaouil.dev@...il.com>,
 "rppt@...nel.org" <rppt@...nel.org>,
 "dave.hansen@...el.com" <dave.hansen@...el.com>, yuanchu@...gle.com
Subject: Re: [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion
 based on PTE A bit scanning



On 1/23/2025 7:57 PM, Raghavendra K T wrote:
> Bharata and I would like to propose the following topic for LSFMM.
> 
> Topic: Overhauling hot page detection and promotion based on PTE A bit scanning.
>   
> In the Linux kernel, hot page information can potentially be obtained from
> multiple sources:
>   
> a. PROT_NONE faults (NUMA balancing)
> b. PTE Access bit (LRU scanning)
> c. Hardware provided page hotness info (like AMD IBS)
>   
> This information is further used to migrate (or promote) pages from slow memory
> tier to top tier to increase performance.
> 
> In the current hot page promotion mechanism, all the activities including the
> process address space scanning, NUMA hint fault handling and page migration are
> performed in the process context. i.e., scanning overhead is borne by the
> applications.
>  
> I had recently posted a patch [1] to improve this in the context of slow-tier
> page promotion. Here, Scanning is done by a global kernel thread which routinely
> scans all the processes' address spaces and checks for accesses by reading the
> PTE A bit. The hot pages thus identified are maintained in list and subsequently> are promoted to a default top-tier node. Thus, the approach pushes overhead of
> scanning, NUMA hint faults and migrations off from process context.
> 
> The topic was presented in the MM alignment session hosted by David Rientjes [2].
> The topic also finds a mention in S J Park's LSFMM proposal [3].
>   
> Here is the list of potential discussion points:
> 1. Other improvements and enhancements to PTE A bit scanning approach. Use of
> multiple kernel threads, throttling improvements, promotion policies, per-process
> opt-in via prctl, virtual vs physical address based scanning, tuning hot page
> detection algorithm etc.

Yuanchu's MGLRU periodic aging series [1] seems quite relevant here,
you might want to look at it. adding Yuanchu to Cc.

By the way, do you have any reason why you'd prefer opt-in prctl
over per-memcg control?

[1] https://lore.kernel.org/all/20221214225123.2770216-1-yuanchu@google.com/
  
> 2. Possibility of maintaining single source of truth for page hotness that would
> maintain hot page information from multiple sources and let other sub-systems
> use that info.
> 
> 3. Discuss how hardware provided hotness info (like AMD IBS) can further aid
> promotion. Bharata had posted an RFC [4] on this a while back.
> 
> 4. Overlap with DAMON and potential reuse.
>   
> Links:
> 
> [1] https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@amd.com/
> [2] https://lore.kernel.org/linux-mm/20241226012833.rmmbkws4wdhzdht6@ed.ac.uk/T/
> [3] https://lore.kernel.org/lkml/Z4XUoWlU-UgRik18@gourry-fedora-PF4VCD3F/T/
> [4] https://lore.kernel.org/lkml/20230208073533.715-2-bharata@amd.com/
>   
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ