[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875xa41uj9.fsf@oracle.com>
Date: Wed, 17 Dec 2025 16:51:54 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, x86@...nel.org, david@...nel.org, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
mjguzik@...il.com, luto@...nel.org, peterz@...radead.org,
tglx@...utronix.de, willy@...radead.org, raghavendra.kt@....com,
chleroy@...nel.org, ioworker0@...il.com, boris.ostrovsky@...cle.com,
konrad.wilk@...cle.com, kristina.martsenko@....com,
catalin.marinas@....com
Subject: Re: [PATCH v10 7/8] mm, folio_zero_user: support clearing page ranges
[ Added Kristina and Catalin for the FEAT_MOPS question. ]
Andrew Morton <akpm@...ux-foundation.org> writes:
> On Wed, 17 Dec 2025 11:51:43 -0800 Ankur Arora <ankur.a.arora@...cle.com> wrote:
>
>> > If so, what's the timing on that? It would be nice to do it in the
>> > current -rc cycle for testing reasons and so the changelogs can be
>> > updated to reflect the altered performance numbers.
>>
>> I can send out an updated version of this patch later today. I think the
>> only real change is updating the constant and perf stats motivating
>> the chunk size value of 32MB.
>
> Yep. A tiny change wouldn't normally require a full resend, but fairly
> widespread changelog updates would best be handled with a v11, please.
True, it will need updates to patches 7 and 8.
Will send out a v11 after rerunning tests for both of those. Might take
a day or two but should be able to send it out this week.
>> Anything else you also think needs doing for this?
>
> Nope. Just lots of review, as always ;)
>
> What's the story with architectures other that x86, btw?
The only other architecture I know of which has a similar range
primitive is arm64 (with FEAT_MOPS). That should be extendable to larger
page sizes.
Don't have any numbers on it though. It's only available after arm64 v8.7
which I should have access to next year.
(But maybe Kristina or Catalin have tried out clearing large ranges with
MOPS?)
Other than that, the only one I know of is powerpc which already uses a
primitive to zero a cacheline (DCBZ). Which seems quite similar to
CLZERO on AMD Zen systems (though CLZERO does uncached weakly ordered
writes so needs a store barrier at the end).
Just from googling powerpc's implementation seems to be pretty optimal
already so probably wouldn't gain much from larger chunk sizes and
removal of the cond_resched().
But, CLZERO performs on par (or better) than this "REP; STOS"
implementation especially for smaller extents. So maybe in the future
we could use it to improve the 2MB performance for AMD Zen.
IMO the fiddly part might be in deciding when the cost of not-caching
is higher than the speedup from not-caching.
--
ankur
Powered by blists - more mailing lists