lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20251006145432.4132418-1-joshua.hahnjy@gmail.com>
Date: Mon,  6 Oct 2025 07:54:31 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Brendan Jackman <jackmanb@...gle.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...e.com>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Zi Yan <ziy@...dia.com>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: [RFC] [PATCH] mm/page_alloc: pcp->batch tuning

Recently while working on another patch about batching
free_pcppages_bulk [1], I was curious why pcp->batch was always 63 on my
machine. This led me to zone_batchsize(), where I found this set of
lines to determine what the batch size should be for the host:

	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
	batch /= 4;		/* We effectively *= 4 below */
	if (batch < 1)
		batch = 1;

All of this is good, except the comment above which says "We effectively
*= 4 below". Nowhere else in the function zone_batchsize(), is there a
corresponding multipliation by 4. Looking into the history of this, it
seems like Dave Hansen had also noticed this back in 2013 [1]. Turns out
there *used* to be a corresponding *= 4, which was turned into a *= 6
later on to be used in pageset_setup_from_batch_size(), which no longer
exists.

This leaves us with a /= 4 with no corresponding *= 4 anywhere, which
leaves pcp->batch mistuned from the original intent when it was
introduced. This is made worse by the fact that pcp lists are generally
larger today than they were in 2013, meaning batch sizes should have
increased, not decreased.

While the obvious solution is to remove this /= 4 to restore the
original tuning heuristics, I think this discovery opens up a discussion
on what pcp->batch should be, and whether this is something that should
be dynamically tuned based on the system's usage, like pcp->high.

Naively removing the /= 4 also changes the tuning for the entire system,
so I am a bit hesitant to just simply remove this, even though I believe
having a larger batch size (this means the new default batch size will
be the # of pages it takes to get 1M) can be helpful for the general
scale of machines running today, as opposed to 12 years ago.

I've left this patch as an RFC to see what folks have to say about this
decision.

[1] https://lore.kernel.org/all/20251002204636.4016712-1-joshua.hahnjy@gmail.com/
[2] https://lore.kernel.org/linux-mm/20131015203547.8724C69C@viggo.jf.intel.com/

Signed-off-by: Joshua Hahn <joshua.hahnjy@...il.com>
---
 mm/page_alloc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d1d037f97c5f..b4db0d09d145 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5815,7 +5815,6 @@ static int zone_batchsize(struct zone *zone)
 	 * and zone lock contention.
 	 */
 	batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE);
-	batch /= 4;		/* We effectively *= 4 below */
 	if (batch < 1)
 		batch = 1;
 

base-commit: 097a6c336d0080725c626fda118ecfec448acd0f
-- 
2.47.3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ