linux-kernel - Re: [PATCH v8 5/7] x86/clear_page: Introduce clear

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <878qgtgu8w.fsf@oracle.com>
Date: Wed, 29 Oct 2025 16:31:43 -0700
From: Ankur Arora <ankur.a.arora@...cle.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, x86@...nel.org, akpm@...ux-foundation.org,
        david@...hat.com, dave.hansen@...ux.intel.com, hpa@...or.com,
        mingo@...hat.com, mjguzik@...il.com, luto@...nel.org,
        peterz@...radead.org, acme@...nel.org, namhyung@...nel.org,
        tglx@...utronix.de, willy@...radead.org, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH v8 5/7] x86/clear_page: Introduce clear_pages()


Borislav Petkov <bp@...en8.de> writes:

> On Tue, Oct 28, 2025 at 11:51:39AM -0700, Ankur Arora wrote:
>> The intent was to use a large enough value that enables uarchs which do
>> 'REP; STOS' optimizations, but not too large so we end up with high
>> preemption latency.
>
> How is selecting that number tied to uarches which can do REP; STOSB? I assume
> you mean REP; STOSB where microcode magic glue aggregates larger moves than
> just u64 chunks but only under certain conditions and so on..., and not
> REP_GOOD where the microcode doesn't have problems with REP prefixes...

Yes, to what you say below.

>> > Why isn't this thing determined dynamically during boot or so, instead of
>> > hardcoding it this way and then having to change it again later when bandwidth
>> > increases?
>>
>> I thought of doing that but given that the precise value doesn't matter
>> very much (and there's enough slack in it in either direction) it seemed
>> unnecessary to do at this point.
>>
>> Also, I'm not sure that a boot determined value would really help given
>> that the 'REP; STOS' bandwidth could be high or low based on how
>> saturated the bus is.
>>
>> Clearly some of this detail should have been in my commit message.
>
> So you want to have, say, 8MB of contiguous range - if possible - and let the
> CPU do larger clears. And it depends on the scheduling model. And it depends
> on what the CPU can do wrt length aggregation. Close?

Yeah pretty much that. Just to restate:

 - be large enough so CPUs that can optimize, are able to optimize
 - even in the bad cases (CPUs that don't optimize and/or are generally
   slow at this optimization): should be fast enough that we have
   reasonable preemption latency (which is an issue only for voluntary
   preemption etc)

> Well, I would like, please, for this to be properly documented why it was
> selected this way and what *all* the aspects were to select it this way so
> that we can know why it is there and we can change it in the future if
> needed.
>
> It is very hard to do so if the reasoning behind it has disappeared in the
> bowels of lkml...

Ack. Yeah I should have documented this way better.

Thanks
--
ankur