[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250926173353.2850266-1-joshua.hahnjy@gmail.com>
Date: Fri, 26 Sep 2025 10:33:52 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Brendan Jackman <jackmanb@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
Chris Mason <clm@...com>,
Kiryl Shutsemau <kirill@...temov.name>,
Michal Hocko <mhocko@...e.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Zi Yan <ziy@...dia.com>,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
kernel-team@...a.com
Subject: Re: [PATCH v2 2/4] mm/page_alloc: Perform appropriate batching in drain_pages_zone
On Fri, 26 Sep 2025 16:57:16 +0000 Brendan Jackman <jackmanb@...gle.com> wrote:
> On Fri Sep 26, 2025 at 3:48 PM UTC, Joshua Hahn wrote:
> > On Fri, 26 Sep 2025 14:01:43 +0000 Brendan Jackman <jackmanb@...gle.com> wrote:
> >> Hey Joshua, do you know why pcp->batch is a factor here at all? Until
> >> now I never really noticed it. I thought that this field was a kinda
> >> dynamic auto-tuning where we try to make the pcplists a more aggressive
> >> cache when they're being used a lot and then shrink them down when the
> >> allocator is under less load. But I don't have a good intuition for why
> >> that's relevant to drain_pages_zone(). Something to do with the amount
> >> of lock contention we expect?
> >
> > From my understanding, pcp->batch is a value that can be used to batch
> > both allocation and freeing operations. For instance, drain_zone_pages
> > uses pcp->batch to ensure that we don't free too many pages at once,
> > which would lead to things like lock contention (I will address the
> > similarity between drain_zone_pages and drain_pages_zone at the end).
> >
> > As for the purpose of batch and how its value is determined, I got my
> > understanding from this comment in zone_batchsize:
> >
> > * ... The batch
> > * size is striking a balance between allocation latency
> > * and zone lock contention.
Hi Brendan, thanks for your feedback!
> > And based on this comment, I think a symmetric argument can be made for
> > freeing by just s/allocation latency/freeing latency above. My understanding
> > was that if we are allocating at a higher factor, we should also be freeing
> > at a higher factor to clean up those allocations faster as well, and it seems
> > like this is reflected in decay_pcp_high, where a higher batch means we
> > lower pcp->high to try and free up more pages.
>
> Hmm thanks, now I'm reading it again I think I was not clear in my head
> on how ->batch is used. It's more like a kinda static "batchiness"
> parameter that informs the dynamic scaling stuff rather than being an
> output of it, in that context it's less surprising that the drain code
> cares about it.
It also took me a while to understand how all of the pcp high, count,
batch, etc. tuning worked. But yes, from my understanding, batch is used
as the parameter that helps inform things like what value to decay
pcp->high to, as well as how many pages to free, how many pages to leave
remaining in the pcp, among other operations.
> > Please let me know if my understanding of this area is incorrect here!
> >
> >> Unless I'm just being stupid here, maybe a chance to add commentary.
> >
> > I can definitely add some more context in the next version for this patch.
> > Actually, you are right -- reading back in my patch description, I've
> > motivated why we want batching, but not why pcp->batch is a good candidate
> > for this value. I'll definitely go back and clean it up!
> >
> >> >
> >> > Signed-off-by: Joshua Hahn <joshua.hahnjy@...il.com>
> >> > ---
> >> > mm/page_alloc.c | 3 +--
> >> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >> >
> >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> > index 77e7d9a5f149..b861b647f184 100644
> >> > --- a/mm/page_alloc.c
> >> > +++ b/mm/page_alloc.c
> >> > @@ -2623,8 +2623,7 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone)
> >> > spin_lock(&pcp->lock);
> >> > count = pcp->count;
> >> > if (count) {
> >> > - int to_drain = min(count,
> >> > - pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
> >> > + int to_drain = min(count, pcp->batch);
> >>
> >> We actually don't need the min() here as free_pcppages_bulk() does that
> >> anyway. Not really related to the commit but maybe worth tidying that
> >> up.
> >
> > Please correct me if I am missing something, but I think we still need the
> > min() here, since it takes the min of count and pcp->batch, while the
> > min in free_pcppages_bulk takes the min of the above result and pcp->count.
>
> Hold on, what's the difference between count and pcp->count here?
Hm, sorry. I think I also confused myself here. count is indeed pcp->count.
So what is basically happening here is
min3(pcp->count, pcp->batch, pcp->count)
... and you are absolutely right that one instance of pcp->count is
unnecessary here. So it seems like the solution is just to pass
pcp->batch to free_pcppages_bulk, and that way we can have only one
instance of pcp->count!
I think I confused myself because I thought you meant we don't need the
pcp->batch portion of this min(), not the pcp->count!
With all of this said, as noted in my conversation with Christoph I think
this patch may be unnecessary since this is not a source of contention
or lock monopoly anyways (in fact, it may even be harmful to batch it
to a lower value here).
> > From what I can understand, the goal of the min() in free_pcppages_bulk
> > is to ensure that we don't try to free more pages than exist in the pcp
> > (hence the min with count), while the goal of my min() is to not free
> > too many pages at once.
>
> Yeah, I think we're in agreement about the intent, it's just that one of
> us is misreading the code (and I think it might be me, I will probably
> be more certain on Monday!).
It was me : -)
Thank you for taking the time to review this. I hope you have a great day!
Joshua
Powered by blists - more mailing lists