[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250131122803.000031aa@huawei.com>
Date: Fri, 31 Jan 2025 12:28:03 +0000
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Raghavendra K T <raghavendra.kt@....com>
CC: <linux-mm@...ck.org>, <akpm@...ux-foundation.org>,
<lsf-pc@...ts.linux-foundation.org>, <bharata@....com>, <gourry@...rry.net>,
<nehagholkar@...a.com>, <abhishekd@...a.com>, <ying.huang@...ux.alibaba.com>,
<nphamcs@...il.com>, <hannes@...xchg.org>, <feng.tang@...el.com>,
<kbusch@...a.com>, <Hasan.Maruf@....com>, <sj@...nel.org>,
<david@...hat.com>, <willy@...radead.org>, <k.shutemov@...il.com>,
<mgorman@...hsingularity.net>, <vbabka@...e.cz>, <hughd@...gle.com>,
<rientjes@...gle.com>, <shy828301@...il.com>, <liam.howlett@...cle.com>,
<peterz@...radead.org>, <mingo@...hat.com>, <nadav.amit@...il.com>,
<shivankg@....com>, <ziy@...dia.com>, <jhubbard@...dia.com>,
<AneeshKumar.KizhakeVeetil@....com>, <linux-kernel@...r.kernel.org>,
<jon.grimm@....com>, <santosh.shukla@....com>, <Michael.Day@....com>,
<riel@...riel.com>, <weixugc@...gle.com>, <leesuyeon0506@...il.com>,
<honggyu.kim@...com>, <leillc@...gle.com>, <kmanaouil.dev@...il.com>,
<rppt@...nel.org>, <dave.hansen@...el.com>
Subject: Re: [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion
based on PTE A bit scanning
> Here is the list of potential discussion points:
...
> 2. Possibility of maintaining single source of truth for page hotness that would
> maintain hot page information from multiple sources and let other sub-systems
> use that info.
Hi,
I was thinking of proposing a separate topic on a single source of hotness,
but this question covers it so I'll add some thoughts here instead.
I think we are very early, but sharing some experience and thoughts in a
session may be useful.
What do the other subsystems that want to use a single source of page hotness
want to be able to find out? (subject to filters like memory range, process etc)
A) How hot is page X?
- Is this useful, or too much data? What would use it?
* Application optimization maybe. Very handy for developing algorithms
to do the rest of the options here as an Oracle!
- Provides both the cold and hot end of the scale, but maybe measurement
techniques vary and can not be easily combined. Hard in general to combine
multiple sources of truth if aiming for an absolute number.
B) Which pages are super hot?
- Probably these that make the most difference if they are in a slower memory tier.
C) Some pages are hot enough to consider moving?
- This may be good enough to get the key data into the fast memory over time.
- Can combine sources of info as being able to compare precise numbers doesn't matter.
D) Which pages are fairly cold?
- Likewise maybe good enough over time.
E) Which pages are very cold?
- Ideal case for tiering. Swap these with the super hot ones.
- Maybe extra signal for swap / zswap etc
F) Did these hot pages remain hot (and same for cold)
- This is needed to know when to back off doing things as we have unstable
hotness (two phase applications are a pain for this), sampling a few
pages may be fine.
Messy corners:
Temporal aspects.
- If only providing lists of hottest / coldest in last second, very hard
to find those that are of a stable temperature. We end up moving
very hot data (which is disruptive) and it doesn't stay hot.
- Can reduce that affect by long sampling windows on some measurement approaches
(on hardware trackers that can trash accuracy due to resource exhaustion
and other subtle effects).
- bistable / phase based applications are a pain but perhaps up to higher
levels to back off.
My main interest is migrating in tiered systems but good to look at what
else would use a common layer.
Mostly I want to know something that is useful to move, and assume convergence
over the long term with the best things to move so to me the ideal layer has
following interface (strawman so shoot holes in it!):
1) Give me up to X hotish pages from a slow tier (greater than a specific measure
of temperature)
2) Give me X coldish pages a faster tier.
3) I expect to ask again in X seconds so please have some info ready for me!
4) (a path to get an idea of 'unhelpful moves' from earlier iterations - this
is bleeding the tiering application into a shared interface though).
If we have multiple subsystems using the data we will need to resolve their
conflicting demands to generate good enough data with appropriate overhead.
I'd also like a virtualized solution for case of hardware PA trackers (what
I have with CXL Hotness Monitoring Units) and classic memory pool / stranding
avoidance case where the VM is the right entity to make migration decisions.
Making that interface convey what the kernel is going to use would be an
efficient option. I'd like to hide how the sausage was made from the VM.
Jonathan
Powered by blists - more mailing lists