[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4dc1d2e1-d5d7-2812-aa8b-f8ba6b9fb207@arm.com>
Date: Mon, 10 Jul 2023 14:28:15 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Barry Song <21cnbao@...il.com>
Cc: Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Ard Biesheuvel <ardb@...nel.org>,
Marc Zyngier <maz@...nel.org>,
Oliver Upton <oliver.upton@...ux.dev>,
James Morse <james.morse@....com>,
Suzuki K Poulose <suzuki.poulose@....com>,
Zenghui Yu <yuzenghui@...wei.com>,
Andrey Ryabinin <ryabinin.a.a@...il.com>,
Alexander Potapenko <glider@...gle.com>,
Andrey Konovalov <andreyknvl@...il.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Vincenzo Frascino <vincenzo.frascino@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Anshuman Khandual <anshuman.khandual@....com>,
Matthew Wilcox <willy@...radead.org>,
Yu Zhao <yuzhao@...gle.com>,
Mark Rutland <mark.rutland@....com>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH v1 00/14] Transparent Contiguous PTEs for User Mappings
On 10/07/2023 13:05, Barry Song wrote:
> On Thu, Jun 22, 2023 at 11:00 PM Ryan Roberts <ryan.roberts@....com> wrote:
>>
>> Hi All,
>>
[...]
>>
>> Performance
>> -----------
>>
>> Below results show 2 benchmarks; kernel compilation and speedometer 2.0 (a
>> javascript benchmark running in Chromium). Both cases are running on Ampere
>> Altra with 1 NUMA node enabled, Ubuntu 22.04 and XFS filesystem. Each benchmark
>> is repeated 15 times over 5 reboots and averaged.
>>
>> All improvements are relative to baseline-4k. anonfolio and exefolio are as
>> described above. contpte is this series. (Note that exefolio only gives an
>> improvement because contpte is already in place).
>>
>> Kernel Compilation (smaller is better):
>>
>> | kernel | real-time | kern-time | user-time |
>> |:-------------|------------:|------------:|------------:|
>> | baseline-4k | 0.0% | 0.0% | 0.0% |
>> | anonfolio | -5.4% | -46.0% | -0.3% |
>> | contpte | -6.8% | -45.7% | -2.1% |
>> | exefolio | -8.4% | -46.4% | -3.7% |
>
> sorry i am a bit confused. in exefolio case, is anonfolio included?
> or it only has large cont-pte folios on exe code? in the other words,
> Does the 8.4% improvement come from iTLB miss reduction only,
> or from both dTLB and iTLB miss reduction?
The anonfolio -> contpte -> exefolio results are incremental. So:
anonfolio: baseline-4k + anonfolio changes
contpte: anonfolio + contpte changes
exefolio: contpte + exefolio changes
So yes, exefolio includes anonfolio. Sorry for the confusion.
>
>> | baseline-16k | -8.7% | -49.2% | -3.7% |
>> | baseline-64k | -10.5% | -66.0% | -3.5% |
>>
>> Speedometer 2.0 (bigger is better):
>>
>> | kernel | runs_per_min |
>> |:-------------|---------------:|
>> | baseline-4k | 0.0% |
>> | anonfolio | 1.2% |
>> | contpte | 3.1% |
>> | exefolio | 4.2% |
>
> same question as above.
same answer as above.
Thanks,
Ryan
>
>> | baseline-16k | 5.3% |
>>
Powered by blists - more mailing lists