linux-kernel - Re: [PATCH v2 2/4] mm/page_alloc: Perform appropriate batching in drain_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250927004617.7667-1-hdanton@sina.com>
Date: Sat, 27 Sep 2025 08:46:15 +0800
From: Hillf Danton <hdanton@...a.com>
To: Joshua Hahn <joshua.hahnjy@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	kernel-team@...a.com
Subject: Re: [PATCH v2 2/4] mm/page_alloc: Perform appropriate batching in drain_pages_zone

On Wed, 24 Sep 2025 13:44:06 -0700 Joshua Hahn wrote:
> drain_pages_zone completely drains a zone of its pcp free pages by
> repeatedly calling free_pcppages_bulk until pcp->count reaches 0.
> In this loop, it already performs batched calls to ensure that
> free_pcppages_bulk isn't called to free too many pages at once, and
> relinquishes & reacquires the lock between each call to prevent
> lock starvation from other processes.
> 
> However, the current batching does not prevent lock starvation. The
> current implementation creates batches of
> pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX, which has been seen in
> Meta workloads to be up to 64 << 5 == 2048 pages.
> 
> While it is true that CONFIG_PCP_BATCH_SCALE_MAX is a config and
> indeed can be adjusted by the system admin to be any number from
> 0 to 6, it's default value of 5 is still too high to be reasonable for
> any system.
> 
> Instead, let's create batches of pcp->batch pages, which gives a more
> reasonable 64 pages per call to free_pcppages_bulk. This gives other
> processes a chance to grab the lock and prevents starvation. Each

Feel free to make it clear, which lock is contended, pcp->lock or
zone->lock, or both, to help understand the starvation.

If the zone lock is hot, why did numa node fail to mitigate the contension,
given workloads tested with high sustained memory pressure on large machines
in the Meta fleet (1Tb memory, 316 CPUs)?

Can the contension be observed with tight memory pressure but not highly tight? 
If not, it is due to misconfigure in the user space, no?

> individual call to drain_pages_zone may take longer, but we avoid the
> worst case scenario of completely starving out other system-critical
> threads from acquiring the pcp lock while 2048 pages are freed
> one-by-one.