[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4fe2408b-7435-41c2-a6b8-82cefeea50ed@arm.com>
Date: Wed, 16 Oct 2024 17:08:01 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: David Hildenbrand <david@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Anshuman Khandual <anshuman.khandual@....com>,
Ard Biesheuvel <ardb@...nel.org>, Catalin Marinas <catalin.marinas@....com>,
Greg Marsden <greg.marsden@...cle.com>, Ivan Ivanov <ivan.ivanov@...e.com>,
Kalesh Singh <kaleshsingh@...gle.com>, Marc Zyngier <maz@...nel.org>,
Mark Rutland <mark.rutland@....com>, Matthias Brugger <mbrugger@...e.com>,
Miroslav Benes <mbenes@...e.cz>, Will Deacon <will@...nel.org>,
Donald Dutile <ddutile@...hat.com>
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64
On 16/10/2024 16:16, David Hildenbrand wrote:
>> Performance Testing
>> ===================
>>
>> I've run some limited performance benchmarks:
>>
>> First, a real-world benchmark that causes a lot of page table manipulation (and
>> therefore we would expect to see regression here if we are going to see it
>> anywhere); kernel compilation. It barely registers a change. Values are times,
>> so smaller is better. All relative to base-4k:
>>
>> | | kern | kern | user | user | real | real |
>> | config | mean | stdev | mean | stdev | mean | stdev |
>> |-------------|---------|---------|---------|---------|---------|---------|
>> | base-4k | 0.0% | 1.1% | 0.0% | 0.3% | 0.0% | 0.3% |
>> | compile-4k | -0.2% | 1.1% | -0.2% | 0.3% | -0.1% | 0.3% |
>> | boot-4k | 0.1% | 1.0% | -0.3% | 0.2% | -0.2% | 0.2% |
>>
>> The Speedometer JavaScript benchmark also shows no change. Values are runs per
>> min, so bigger is better. All relative to base-4k:
>>
>> | config | mean | stdev |
>> |-------------|---------|---------|
>> | base-4k | 0.0% | 0.8% |
>> | compile-4k | 0.4% | 0.8% |
>> | boot-4k | 0.0% | 0.9% |
>>
>> Finally, I've run some microbenchmarks known to stress page table manipulations
>> (originally from David Hildenbrand). The fork test maps/allocs 1G of anon
>> memory, then measures the cost of fork(). The munmap test maps/allocs 1G of anon
>> memory then measures the cost of munmap()ing it. The fork test is known to be
>> extremely sensitive to any changes that cause instructions to be aligned
>> differently in cachelines. When using this test for other changes, I've seen
>> double digit regressions for the slightest thing, so 12% regression on this test
>> is actually fairly good. This likely represents the extreme worst case for
>> regressions that will be observed across other microbenchmarks (famous last
>> words). Values are times, so smaller is better. All relative to base-4k:
>>
>
> ... and here I am, worrying about much smaller degradation in these micro-
> benchmark ;) You're right, these are pure micro-benchmarks, and while 12% does
> sound like "much", even stupid compiler code movement can result in such changes
> in the fork() micro benchmark.
>
> So I think this is just fine, and actually "surprisingly" small. And, there is
> even a way to statically compile a page size and not worry about that at all.
>
> As discussed ahead of times, I consider this change very valuable. In RHEL, the
> biggest issue is actually the test matrix, that cannot really be reduced
> significantly ... but it will make shipping/packaging easier.
>
> CCing Don, who did the separate 64k RHEL flavor kernel.
>
Thanks, David! I'm planning to investigate and see if I can improve even on that
12%. I have a couple of ideas. But like you say, I don't think this should be a
blocker to moving forwards.
Powered by blists - more mailing lists