lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87h60bthmk.fsf@oracle.com>
Date: Thu, 19 Jun 2025 16:51:31 -0700
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
        akpm@...ux-foundation.org, bp@...en8.de, dave.hansen@...ux.intel.com,
        hpa@...or.com, mingo@...hat.com, mjguzik@...il.com, luto@...nel.org,
        peterz@...radead.org, acme@...nel.org, namhyung@...nel.org,
        tglx@...utronix.de, willy@...radead.org, jon.grimm@....com,
        bharata@....com, raghavendra.kt@....com, boris.ostrovsky@...cle.com,
        konrad.wilk@...cle.com
Subject: Re: [PATCH v4 13/13] x86/folio_zero_user: Add multi-page clearing


Ankur Arora <ankur.a.arora@...cle.com> writes:

> Dave Hansen <dave.hansen@...el.com> writes:
>
>> On 6/15/25 22:22, Ankur Arora wrote:

[ ... ]

>> The second problem with where this ends up is that none of the code is
>> *actually* x86-specific. The only thing that x86 provides that's
>> interesting is a clear_pages() implementation that hands >PAGE_SIZE
>> units down to the CPUs.
>>
>> The result is ~100 lines of code that will compile and run functionally
>> on any architecture.
>
> True. The underlying assumption is that you can provide extent level
> information to string instructions which AFAIK only exists on x86.
>
>> To me, that's deserving of an ARCH_HAS_FOO bit that we can set on the
>> x86 side that then cajoles the core mm/ code to use the fancy new
>> clear_pages_resched() implementation.
>
> This seems straight-forward enough.
>
>> Because what are the arm64 guys going to do when their CPUs start doing
>> this? They're either going to copy-and-paste the x86 implementation or
>> they're going to go move the refactor the x86 implementation into common
>> code.
>
> These instructions have been around for an awfully long time. Are other
> architectures looking at adding similar instructions?

Just to answer my own question: arm64 with FEAT_MOPS (post v8.8) does
support operating on memory extents. (Both clearing and copying.)

> I think this is definitely worth if there are performance advantages on
> arm64 -- maybe just because of the reduced per-page overhead.
>
> Let me try this out on arm64.
>
>> My money is on the refactoring, because those arm64 guys do good work.
>> Could we save them the trouble, please?

I thought about this and this definitely makes sense to do. But, it
really suggests a larger set of refactors:

1. hugepage clearing via clear_pages() (this series)
2. hugepage copying via copy_pages()

Both of these are faster than the current per page approach on x86. And,
from some preliminary tests, at least no slower no arm64.
(My arm64 test machine does not have the FEAT_MOPS.)

With those two done we should be able to simplify the current
folio_zero_user(), copy_user_large_folio(), process_huge_page() which
is overcomplicated. Other archs that care about performance could
switch to the multiple page approach.

3. Simplify the logic around process_huge_page().

None of these pieces are overly complex. I think the only question is
how to stage it.

Ideally I would like to stage them sequentially and not send out a
single unwieldy series that touches mm and has performance implications
for multiple architectures.

Also would be good to get wider testing for each part.

What do you think? I guess this is also a question for Andrew.

--
ankur

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ