lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875xa41uj9.fsf@oracle.com>
Date: Wed, 17 Dec 2025 16:51:54 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, x86@...nel.org, david@...nel.org, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
        mjguzik@...il.com, luto@...nel.org, peterz@...radead.org,
        tglx@...utronix.de, willy@...radead.org, raghavendra.kt@....com,
        chleroy@...nel.org, ioworker0@...il.com, boris.ostrovsky@...cle.com,
        konrad.wilk@...cle.com, kristina.martsenko@....com,
        catalin.marinas@....com
Subject: Re: [PATCH v10 7/8] mm, folio_zero_user: support clearing page ranges


[ Added Kristina and Catalin for the FEAT_MOPS question. ]

Andrew Morton <akpm@...ux-foundation.org> writes:

> On Wed, 17 Dec 2025 11:51:43 -0800 Ankur Arora <ankur.a.arora@...cle.com> wrote:
>
>> > If so, what's the timing on that?  It would be nice to do it in the
>> > current -rc cycle for testing reasons and so the changelogs can be
>> > updated to reflect the altered performance numbers.
>>
>> I can send out an updated version of this patch later today. I think the
>> only real change is updating the constant and perf stats motivating
>> the chunk size value of 32MB.
>
> Yep.  A tiny change wouldn't normally require a full resend, but fairly
> widespread changelog updates would best be handled with a v11, please.

True, it will need updates to patches 7 and 8.

Will send out a v11 after rerunning tests for both of those. Might take
a day or two but should be able to send it out this week.

>> Anything else you also think needs doing for this?
>
> Nope.  Just lots of review, as always ;)
>
> What's the story with architectures other that x86, btw?

The only other architecture I know of which has a similar range
primitive is arm64 (with FEAT_MOPS). That should be extendable to larger
page sizes.
Don't have any numbers on it though. It's only available after arm64 v8.7
which I should have access to next year.
(But maybe Kristina or Catalin have tried out clearing large ranges with
MOPS?)

Other than that, the only one I know of is powerpc which already uses a
primitive to zero a cacheline (DCBZ). Which seems quite similar to
CLZERO on AMD Zen systems (though CLZERO does uncached weakly ordered
writes so needs a store barrier at the end).

Just from googling powerpc's implementation seems to be pretty optimal
already so probably wouldn't gain much from larger chunk sizes and
removal of the cond_resched().

But, CLZERO performs on par (or better) than this "REP; STOS"
implementation especially for smaller extents. So maybe in the future
we could use it to improve the 2MB performance for AMD Zen.

IMO the fiddly part might be in deciding when the cost of not-caching
is higher than the speedup from not-caching.

--
ankur

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ