linux-kernel - Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241112104544.574dd733@mordecai.tesarici.cz>
Date: Tue, 12 Nov 2024 10:45:44 +0100
From: Petr Tesarik <ptesarik@...e.com>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Anshuman Khandual
 <anshuman.khandual@....com>, Ard Biesheuvel <ardb@...nel.org>, Catalin
 Marinas <catalin.marinas@....com>, David Hildenbrand <david@...hat.com>,
 Greg Marsden <greg.marsden@...cle.com>, Ivan Ivanov <ivan.ivanov@...e.com>,
 Kalesh Singh <kaleshsingh@...gle.com>, Marc Zyngier <maz@...nel.org>, Mark
 Rutland <mark.rutland@....com>, Matthias Brugger <mbrugger@...e.com>,
 Miroslav Benes <mbenes@...e.cz>, Will Deacon <will@...nel.org>,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org
Subject: Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64

On Mon, 11 Nov 2024 12:25:35 +0000
Ryan Roberts <ryan.roberts@....com> wrote:

> Hi Petr,
> 
> On 11/11/2024 12:14, Petr Tesarik wrote:
> > Hi Ryan,
> > 
> > On Thu, 17 Oct 2024 13:32:43 +0100
> > Ryan Roberts <ryan.roberts@....com> wrote:
>[...]
> > Third, a few micro-benchmarks saw a significant regression.
> > 
> > Most notably, getenv and getenvT2 tests from libMicro were 18% and 20%
> > slower with variable page size. I don't know why, but I'm looking into
> > it. The system() library call was also about 18% slower, but that might
> > be related.  
> 
> OK, ouch. I think there are some things we can try to optimize the
> implementation further. But I'll wait for your analysis before digging myself.

This turned out to be a false positive. The way this microbenchmark was
invoked did not get enough samples, so it was mostly dependent on
whether caches were hot or cold, and the timing on this specific system
with the specific sequence of bencnmarks in the suite happens to favour
my baseline kernel.

After increasing the batch count, I'm getting pretty much the same
performance for 6.11 vanilla and patched kernels:

                        prc thr   usecs/call      samples   errors cnt/samp 
getenv (baseline)         1   1      0.14975           99        0   100000 
getenv (patched)          1   1      0.14981           92        0   100000 

> You probably also saw the conversation with Catalin about the cost vs benefit of
> this series. Performance regressions will all need to be considered in the cost
> column, of course. So understanding the root cause and trying to reduce the
> regression as much as possible will increase chances of getting it accepted
> upstream.

Yes. Now that the biggest number is off the table, I'm going to:

 - look into the dup() slowdown
 - verify whether VMA split/merge operations are indeed slower

Petr T