[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <760FBDE3-2724-44A6-A874-BD87F0191C57@nvidia.com>
Date: Mon, 01 Dec 2025 12:01:54 -0500
From: Zi Yan <ziy@...dia.com>
To: Hongru Zhang <zhanghongru06@...il.com>
Cc: vbabka@...e.cz, Liam.Howlett@...cle.com, akpm@...ux-foundation.org,
axelrasmussen@...gle.com, david@...nel.org, hannes@...xchg.org,
jackmanb@...gle.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
lorenzo.stoakes@...cle.com, mhocko@...e.com, rppt@...nel.org,
surenb@...gle.com, weixugc@...gle.com, yuanchu@...gle.com,
zhanghongru@...omi.com
Subject: Re: [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and
optimize pagetypeinfo access
On 30 Nov 2025, at 21:36, Hongru Zhang wrote:
>>> On mobile devices, some user-space memory management components check
>>> memory pressure and fragmentation status periodically or via PSI, and
>>> take actions such as killing processes or performing memory compaction
>>> based on this information.
>>
>> Hm /proc/buddyinfo could be enough to determine fragmentation? Also we have
>> in-kernel proactive compaction these days.
>
> In fact, besides /proc/pagetypeinfo, other system resource information is
> also collected at appropriate times, and resource usage throughout the
> process lifecycle is appropriately tracked as well. User-space management
> components integrate this information together to make decisions and
> perform proper actions.
>
>>> Under high load scenarios, reading /proc/pagetypeinfo causes memory
>>> management components or memory allocation/free paths to be blocked
>>> for extended periods waiting for the zone lock, leading to the following
>>> issues:
>>> 1. Long interrupt-disabled spinlocks - occasionally exceeding 10ms on Qcom
>>> 8750 platforms, reducing system real-time performance
>>> 2. Memory management components being blocked for extended periods,
>>> preventing rapid acquisition of memory fragmentation information for
>>> critical memory management decisions and actions
>>> 3. Increased latency in memory allocation and free paths due to prolonged
>>> zone lock contention
>>
>> It could be argued that not capturing /proc/pagetypeinfo (often) would help.
>> I wonder if we can find also other benefits from the counters in the kernel
>> itself.
>
> Collecting system and app resource statistics and making decisions based
> on this information is a common practice among Android device manufacturers.
>
> Currently, there should be over a billion Android phones being used daily
> worldwide. The diversity of hardware configurations across Android devices
> makes it difficult for kernel mechanisms alone to maintain good
> performance across all usage scenarios.
>
> First, hardware capabilities vary greatly - flagship phones may have up to
> 24GB of memory, while low-end devices may have as little as 4GB. CPU,
> storage, battery, and passive cooling capabilities vary significantly due
> to market positioning and cost factors. Hardware resources seem always
> inadequate.
>
> Second, usage scenarios also differ - some people use devices in hot
> environments while others in cold environments; some enjoy high-definition
> gaming while others simply browse the web.
>
> Third, user habits vary as well. Some people rarely restart their phones
> except when the battery dies or the system crashes; others restart daily,
> like me. Some users never actively close apps, only switching them to
> the background, resulting in dozens of apps running in the background and
> keeping system resources consumed (especially memory). Yet others just use
> a few apps, closing unused apps rather than leaving them in the
> background.
>
> Despite the above challenges, Android device manufacturers hope to ensure
> a good user experience (no UI jank) across all situations.
>
> Even at 60 Hz frame refresh rate (90 Hz, 120 Hz also supported now), all
> work from user input to render and display should be done within 16.7 ms.
> To achieve this goal, the management components perform tasks such as:
> - Track system resource status: what system has
> (system resource awareness)
> - Learn and predict app resource demands: what app needs
> (resource demand awareness)
> - Monitor app launch, exit, and foreground-background switches: least
> important app gives back resource to system to serve most important
> one, usually the foreground app
> (user intent awareness)
>
> Tracking system resources seems necessary for Android devices, not
> optional. So the related paths are not that cold on Android devices.
These are all good background information. But how does userspace monitor
utilize pageblock migratetype information? Can you give a concrete example?
Something like when free_movable is low, background apps is killed to
provide more free pages? Or is userspace monitor even trying to attribute
different pageblock usage to each app by monitoring /proc/pagetypeinfo
before and after an app launch?
Thanks.
>
> All the above are from workload perspective. From the kernel perspective,
> regardless of when or how frequently user-space tools read statistical
> information, they should not affect the kernel's own efficiency
> significantly. That's why I submit this patch series to make the read side
> of /proc/pagetypeinfo lock-free. But this does introduce overhead in hot
> path, I would greatly appreciate if we can discuss how to improve it here.
>
>> Adding these migratetype counters is something that wouldn't be even
>> possible in the past, until the freelist migratetype hygiene was merged.
>> So now it should be AFAIK possible, but it's still some overhead in
>> relatively hot paths. I wonder if we even considered this before in the
>> context of migratetype hygiene? Couldn't find anything quickly.
>
> Yes, I wrote the code on old kernel initially, at that time, I reused
> set_pcppage_migratetype (also renamed) to cache the exact migratetype
> list that the page block is on. After the freelist migratetype hygiene
> patches were merged, I removed that logic.
Best Regards,
Yan, Zi
Powered by blists - more mailing lists