lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251009144152.909709-1-joshua.hahnjy@gmail.com>
Date: Thu,  9 Oct 2025 07:41:51 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: "Huang, Ying" <ying.huang@...ux.alibaba.com>
Cc: Dave Hansen <dave.hansen@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Brendan Jackman <jackmanb@...gle.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...e.com>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Zi Yan <ziy@...dia.com>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [RFC] [PATCH] mm/page_alloc: pcp->batch tuning

On Thu, 09 Oct 2025 10:57:05 +0800 "Huang, Ying" <ying.huang@...ux.alibaba.com> wrote:

> Hi, Joshua,
> 
> Joshua Hahn <joshua.hahnjy@...il.com> writes:
> 
> > On Wed, 8 Oct 2025 08:34:21 -0700 Dave Hansen <dave.hansen@...el.com> wrote:
> >
> > Hello Dave, thank you for your feedback!
> >
> >> First of all, I do agree that the comment should go away or get fixed up.
> >> 
> >> But...
> >> 
> >> On 10/6/25 07:54, Joshua Hahn wrote:
> >> > This leaves us with a /= 4 with no corresponding *= 4 anywhere, which
> >> > leaves pcp->batch mistuned from the original intent when it was
> >> > introduced. This is made worse by the fact that pcp lists are generally
> >> > larger today than they were in 2013, meaning batch sizes should have
> >> > increased, not decreased.
> >> 
> >> pcp->batch and pcp->high do very different things. pcp->high is a limit
> >> on the amount of memory that can be tied up. pcp->batch balances
> >> throughput with latency. I'm not sure I buy the idea that a higher
> >> pcp->high means we should necessarily do larger batches.
> >
> > I agree with your observation that a higher pcp->high doesn't mean we should
> > do larger batches. I think what I was trying to get at here was that if
> > pcp lists are bigger, some other values might want to scale.
> >
> > For instance, in nr_pcp_free, pcp->batch is used to determine how many
> > pages should be left in the pcplist (and the rest be freed). Should this
> > value scale with a bigger pcp? (This is not a rhetorical question, I really
> > do want to understand what the implications are here).
> >
> > Another thing that I would like to note is that pcp->high is actually at
> > least in part a function of pcp->batch. In decay_pcp_high, we set
> >
> > pcp->high = max3(pcp->count - (batch << CONFIG_PCP_BATCH_SCALE_MAX), ...)
> >
> > So here, it seems like a higher batch value would actually lead to a much
> > lower pcp->high instead. This actually seems actively harmful to the system.

Hi Ying, thank you for your feedback, as always!

> Batch here is used to control the latency to free the pages from PCP to
> buddy.  Larger batch will lead to larger latency, however it helps to
> reduce the size of PCP more quickly when it becomes idle.  So, we need
> to balance here.

Yes, this makes sense to me. I think one thing that I overlooked when I
initially submitted this patch was that even though the pcp size may have
grown in recent times, the tolerance for the latency associated with freeing
it may have not.

> > So I'll do a take two of this patch and take your advice below and instead
> > of getting rid of the /= 4, just fold it in (or add a better explanation)
> > as to why we do this. Another candidate place to do this seems to be
> > where we do the rounddown_pow_of_two.
> >
> >> So I dunno... f someone wanted to alter the initial batch size, they'd
> >> ideally repeat some of Ying's experiments from: 52166607ecc9 ("mm:
> >> restrict the pcp batch scale factor to avoid too long latency").
> >
> > I ran a few very naive and quick tests on kernel builds, and it seems like
> > for larger machines (1TB memory, 316 processors), this leads to a very
> > significant speedup in system time during a kernel compilation (~10%).
> >
> > But for smaller machines (250G memory, 176 processors) and (62G memory and 36
> > processors), this leads to quite a regression (~5%).
> >
> > So maybe the answer is that this should actually be defined by the machine's
> > size. In zone_batchsize, we set the value of the batch to: 
> >
> > min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE)
> >
> > But maybe it makes sense to let this value grow bigger for larger machines? If
> > anything, I think that the experiment results above do show that batch size does
> > have an impact on the performance, and the effect can either be positive or
> > negative based on the machine's size. I can run some more experiments to 
> > see if there's an opportunity to better tune pcp->batch.
> 
> In fact, we do have some mechanism to scale batch size dynamically
> already, via pcp->alloc_factor and pcp->free_count.
> 
> You could further tune them.  Per my understanding, it should be a
> balance between throughput and latency.

Sounds good with me! I can try to do some tuning to change alloc_factor
and free_count, or see how they currently behave in the system to see if it
is already providing a good balance of throughput and latency.

> >> Better yet, just absorb the /=4 into the two existing batch assignments.
> >> It will probably compile to exactly the same code and have no functional
> >> changes and get rid of the comment.
> >> 
> >> Wouldn't this compile to the same thing?
> >> 
> >>         batch = zone->managed_pages / 4096;
> >>         if (batch * PAGE_SIZE > 128 * 1024)
> >>                 batch = (128 * 1024) / PAGE_SIZE;
> >
> > But for now, this seems good to me. I'll get rid of the confusing comment,
> > and try to fold in the batch value and leave a new comment leaving this
> > as an explanation. 
> >
> > Thank you for your thoughtful review, Dave. I hope you have a great day!
> > Joshua
> 
> ---
> Best Regards,
> Huang, Ying

Thank you, Ying. For now, I'll just submit a new version of this patch that
doesn't drop the /= 4, but just fold it into the lines below so that there
is no more confusion about the comment.

I hope you have a great day!
Joshua

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ