linux-kernel - Re: [PATCH] mm/page_alloc: optimize lowmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <tencent_DFDA2D92B8F69FEF149FE7593DBC47B46E09@qq.com>
Date: Sat, 15 Nov 2025 00:34:58 +0800
From: Fujunjie <fujunjie1@...com>
To: Zi Yan <ziy@...dia.com>, Brendan Jackman <jackmanb@...gle.com>
Cc: akpm@...ux-foundation.org, vbabka@...e.cz, surenb@...gle.com,
 mhocko@...e.com, hannes@...xchg.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/page_alloc: optimize lowmem_reserve max lookup using
 monotonicity

On Sat Nov 15, 2025 at 00:12 AM UTC, Zi Yan wrote:

> My concern on this change is that the correctness of
> calculate_totalreserve_pages() now relies on the implementation of
> setup_per_zone_lowmem_reserve(). How can we make sure in the future
> this will not break when setup_per_zone_lowmem_reserve() is changed?
> Hoping people read the comment and do the right thing?
Thanks for raising this, Zi.

I agree it would be a real problem if calculate_totalreserve_pages()
were relying on a fragile detail of how setup_per_zone_lowmem_reserve()
happens to be written today.

What I intended to rely on is not an implementation detail, but the
semantics of zone->lowmem_reserve[j] for a given zone (with
zone_idx(zone) == i).

For such a zone "i", zone->lowmem_reserve[j] (j > i) represents how many
pages in zone "i" must effectively be kept in reserve when deciding
whether an allocation class that is allowed to allocate from zones up to
"j" may fall back into zone "i". The purpose of these reserves is to
protect allocation classes that cannot use higher zones and therefore
depend more heavily on this lower zone.

When viewed this way, the partial ordering in j comes from the meaning
of the field: as j increases, we are considering allocation classes that
can use a strictly larger set of fallback zones. Those more flexible
allocations should not be allowed to consume more low memory than the
less flexible ones. It would be quite unexpected—in terms of the reserve
semantics—if a higher-j allocation class were permitted to deplete zone
"i" more aggressively than a lower-j one.

So the “non-decreasing in j” property is really a data invariant implied
by the reserve semantics, rather than an assumption about how
setup_per_zone_lowmem_reserve() happens to be implemented today.

setup_per_zone_lowmem_reserve() currently encodes this meaning by
accumulating managed pages from higher zones and applying the configured
ratio. If some future change were to alter that implementation in a way
that breaks monotonicity, that would likely reflect a change in the
intended semantics of lowmem_reserve itself—at which point consumers
like calculate_totalreserve_pages() would naturally need to be updated
as well.

Best Regards,
Junjie,Fu