lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251008193642.953032-1-joshua.hahnjy@gmail.com>
Date: Wed,  8 Oct 2025 12:36:41 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Brendan Jackman <jackmanb@...gle.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...e.com>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Zi Yan <ziy@...dia.com>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	ying.huang@...ux.alibaba.com
Subject: Re: [RFC] [PATCH] mm/page_alloc: pcp->batch tuning

On Wed, 8 Oct 2025 08:34:21 -0700 Dave Hansen <dave.hansen@...el.com> wrote:

Hello Dave, thank you for your feedback!

> First of all, I do agree that the comment should go away or get fixed up.
> 
> But...
> 
> On 10/6/25 07:54, Joshua Hahn wrote:
> > This leaves us with a /= 4 with no corresponding *= 4 anywhere, which
> > leaves pcp->batch mistuned from the original intent when it was
> > introduced. This is made worse by the fact that pcp lists are generally
> > larger today than they were in 2013, meaning batch sizes should have
> > increased, not decreased.
> 
> pcp->batch and pcp->high do very different things. pcp->high is a limit
> on the amount of memory that can be tied up. pcp->batch balances
> throughput with latency. I'm not sure I buy the idea that a higher
> pcp->high means we should necessarily do larger batches.

I agree with your observation that a higher pcp->high doesn't mean we should
do larger batches. I think what I was trying to get at here was that if
pcp lists are bigger, some other values might want to scale.

For instance, in nr_pcp_free, pcp->batch is used to determine how many
pages should be left in the pcplist (and the rest be freed). Should this
value scale with a bigger pcp? (This is not a rhetorical question, I really
do want to understand what the implications are here).

Another thing that I would like to note is that pcp->high is actually at
least in part a function of pcp->batch. In decay_pcp_high, we set

pcp->high = max3(pcp->count - (batch << CONFIG_PCP_BATCH_SCALE_MAX), ...)

So here, it seems like a higher batch value would actually lead to a much
lower pcp->high instead. This actually seems actively harmful to the system.
So I'll do a take two of this patch and take your advice below and instead
of getting rid of the /= 4, just fold it in (or add a better explanation)
as to why we do this. Another candidate place to do this seems to be
where we do the rounddown_pow_of_two.

> So I dunno... f someone wanted to alter the initial batch size, they'd
> ideally repeat some of Ying's experiments from: 52166607ecc9 ("mm:
> restrict the pcp batch scale factor to avoid too long latency").

I ran a few very naive and quick tests on kernel builds, and it seems like
for larger machines (1TB memory, 316 processors), this leads to a very
significant speedup in system time during a kernel compilation (~10%).

But for smaller machines (250G memory, 176 processors) and (62G memory and 36
processors), this leads to quite a regression (~5%).

So maybe the answer is that this should actually be defined by the machine's
size. In zone_batchsize, we set the value of the batch to: 

min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE)

But maybe it makes sense to let this value grow bigger for larger machines? If
anything, I think that the experiment results above do show that batch size does
have an impact on the performance, and the effect can either be positive or
negative based on the machine's size. I can run some more experiments to 
see if there's an opportunity to better tune pcp->batch.

> Better yet, just absorb the /=4 into the two existing batch assignments.
> It will probably compile to exactly the same code and have no functional
> changes and get rid of the comment.
> 
> Wouldn't this compile to the same thing?
> 
>         batch = zone->managed_pages / 4096;
>         if (batch * PAGE_SIZE > 128 * 1024)
>                 batch = (128 * 1024) / PAGE_SIZE;

But for now, this seems good to me. I'll get rid of the confusing comment,
and try to fold in the batch value and leave a new comment leaving this
as an explanation. 

Thank you for your thoughtful review, Dave. I hope you have a great day!
Joshua

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ