[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a041e51-a43b-4878-ab68-4757d3141889@arm.com>
Date: Tue, 12 Nov 2024 10:19:34 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Petr Tesarik <ptesarik@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Anshuman Khandual <anshuman.khandual@....com>,
Ard Biesheuvel <ardb@...nel.org>, Catalin Marinas <catalin.marinas@....com>,
David Hildenbrand <david@...hat.com>, Greg Marsden
<greg.marsden@...cle.com>, Ivan Ivanov <ivan.ivanov@...e.com>,
Kalesh Singh <kaleshsingh@...gle.com>, Marc Zyngier <maz@...nel.org>,
Mark Rutland <mark.rutland@....com>, Matthias Brugger <mbrugger@...e.com>,
Miroslav Benes <mbenes@...e.cz>, Will Deacon <will@...nel.org>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64
On 12/11/2024 09:45, Petr Tesarik wrote:
> On Mon, 11 Nov 2024 12:25:35 +0000
> Ryan Roberts <ryan.roberts@....com> wrote:
>
>> Hi Petr,
>>
>> On 11/11/2024 12:14, Petr Tesarik wrote:
>>> Hi Ryan,
>>>
>>> On Thu, 17 Oct 2024 13:32:43 +0100
>>> Ryan Roberts <ryan.roberts@....com> wrote:
>> [...]
>>> Third, a few micro-benchmarks saw a significant regression.
>>>
>>> Most notably, getenv and getenvT2 tests from libMicro were 18% and 20%
>>> slower with variable page size. I don't know why, but I'm looking into
>>> it. The system() library call was also about 18% slower, but that might
>>> be related.
>>
>> OK, ouch. I think there are some things we can try to optimize the
>> implementation further. But I'll wait for your analysis before digging myself.
>
> This turned out to be a false positive. The way this microbenchmark was
> invoked did not get enough samples, so it was mostly dependent on
> whether caches were hot or cold, and the timing on this specific system
> with the specific sequence of bencnmarks in the suite happens to favour
> my baseline kernel.
>
> After increasing the batch count, I'm getting pretty much the same
> performance for 6.11 vanilla and patched kernels:
>
> prc thr usecs/call samples errors cnt/samp
> getenv (baseline) 1 1 0.14975 99 0 100000
> getenv (patched) 1 1 0.14981 92 0 100000
Oh that's good news! Does this account for all 3 of the above tests (getenv,
getenvT2 and system())?
>
>> You probably also saw the conversation with Catalin about the cost vs benefit of
>> this series. Performance regressions will all need to be considered in the cost
>> column, of course. So understanding the root cause and trying to reduce the
>> regression as much as possible will increase chances of getting it accepted
>> upstream.
>
> Yes. Now that the biggest number is off the table, I'm going to:
>
> - look into the dup() slowdown
> - verify whether VMA split/merge operations are indeed slower
>
> Petr T
Powered by blists - more mailing lists