[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87frckgy3b.fsf@oracle.com>
Date: Wed, 17 Sep 2025 21:54:00 -0700
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, x86@...nel.org, david@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
mjguzik@...il.com, luto@...nel.org, peterz@...radead.org,
acme@...nel.org, namhyung@...nel.org, tglx@...utronix.de,
willy@...radead.org, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com, paulmck@...nel.org
Subject: Re: [PATCH v7 13/16] mm: memory: support clearing page ranges
[ Added Paul McKenney. ]
Andrew Morton <akpm@...ux-foundation.org> writes:
> On Wed, 17 Sep 2025 08:24:15 -0700 Ankur Arora <ankur.a.arora@...cle.com> wrote:
>
>> Change folio_zero_user() to clear contiguous page ranges instead of
>> clearing using the current page-at-a-time approach. Exposing the largest
>> feasible length can be useful in enabling processors to optimize based
>> on extent.
>
> This patch is something which MM developers might care to take a closer
> look at.
>
>> However, clearing in large chunks can have two problems:
>>
>> - cache locality when clearing small folios (< MAX_ORDER_NR_PAGES)
>> (larger folios don't have any expectation of cache locality).
>>
>> - preemption latency when clearing large folios.
>>
>> Handle the first by splitting the clearing in three parts: the
>> faulting page and its immediate locality, its left and right
>> regions; with the local neighbourhood cleared last.
>
> Has this optimization been shown to be beneficial?
So, this was mostly meant to be defensive. The current code does a
rather extensive left-right dance around the faulting page via
c6ddfb6c58 ("mm, clear_huge_page: move order algorithm into a separate
function") and I wanted to keep the cache hot property for the region
closest to the address touched by the user.
But, no I haven't run any tests showing that it helps.
> If so, are you able to share some measurements?
>From some quick kernel builds (with THP) I do see a consistent
difference of a few seconds (1% worse) if I remove this optimization.
(I'm not sure right now why it is worse -- my expectation was that we
would have higher cache misses, but I see pretty similar cache numbers.)
But let me do a more careful test and report back.
> If not, maybe it should be removed?
>
>> ...
>>
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -7021,40 +7021,80 @@ static inline int process_huge_page(
>> return 0;
>> }
>>
>> -static void clear_gigantic_page(struct folio *folio, unsigned long addr_hint,
>> - unsigned int nr_pages)
>> +/*
>> + * Clear contiguous pages chunking them up when running under
>> + * non-preemptible models.
>> + */
>> +static void clear_contig_highpages(struct page *page, unsigned long addr,
>> + unsigned int npages)
>
> Called "_highpages" because it wraps clear_user_highpages(). It really
> should be called clear_contig_user_highpages() ;) (Not serious)
Or maybe clear_user_contig_highpages(), so when we get rid of HUGEMEM,
the _highpages could just be chopped off :D.
>> {
>> - unsigned long addr = ALIGN_DOWN(addr_hint, folio_size(folio));
>> - int i;
>> + unsigned int i, count, unit;
>>
>> - might_sleep();
>> - for (i = 0; i < nr_pages; i++) {
>> + unit = preempt_model_preemptible() ? npages : PAGE_CONTIG_NR;
>
> Almost nothing uses preempt_model_preemptible() and I'm not usefully
> familiar with it. Will this check avoid all softlockup/rcu/etc
> detections in all situations (ie, configs)?
IMO, yes. The code invoked under preempt_model_preemptible() will boil
down to a single interruptible REP STOSB which might execute over
an extent of 1GB (with the last patch). From prior experiments, I know
that irqs are able to interrupt this. And, I /think/ that is a sufficient
condition for avoiding RCU stalls/softlockups etc.
Also, when we were discussing lazy preemption (which Thomas had
suggested as a way to handle scenarios like this or long running Xen
hypercalls etc) this seemed like a scenario that didn't need any extra
handling for CONFIG_PREEMPT.
We did need 83b28cfe79 ("rcu: handle quiescent states for PREEMPT_RCU=n,
PREEMPT_COUNT=y") for CONFIG_PREEMPT_LAZY but AFAICS this should be safe.
Anyway let me think about your all configs point (though only ones which
can have some flavour for hugetlb.)
Also, I would like x86 folks opinion on this. And, maybe Paul McKenney
just to make sure I'm not missing something on RCU side.
Thanks for the comments.
--
ankur
Powered by blists - more mailing lists