[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231122092855.4440-1-shijie@os.amperecomputing.com>
Date: Wed, 22 Nov 2023 17:28:51 +0800
From: Huang Shijie <shijie@...amperecomputing.com>
To: catalin.marinas@....com
Cc: will@...nel.org, mark.rutland@....com, suzuki.poulose@....com,
broonie@...nel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, anshuman.khandual@....com,
robh@...nel.org, oliver.upton@...ux.dev, maz@...nel.org,
patches@...erecomputing.com,
Huang Shijie <shijie@...amperecomputing.com>
Subject: [PATCH 0/4] arm64: an optimization for AmpereOne
0) Background:
We found that AmpereOne benefits from aggressive prefetches when
using 4K page size.
1) This patch:
1.1) adds new WORKAROUND_AMPERE_AC03_PREFETCH capability.
1.2) uses MIDR_AMPERE1 to filter the processor.
1.3) uses alternative_if to alternative the code
for AmpereOne.
1.4) adds software prefetches for the specific loop.
Also add a macro add_prefetch.
2) Test result:
In hugetlb or tmpfs, We can get big seqential read performance improvement
up to 1.3x ~ 1.4x.
Huang Shijie (4):
extable: add __sort_main_extable
arm64: alternative: handle the kernel exception table
arm64: copy_template.S: add loop_for_copy_128_bytes macro
arm64: add software prefetches for AmpereOne
arch/arm64/Kconfig.platforms | 7 +++
arch/arm64/kernel/alternative.c | 18 +++++++
arch/arm64/kernel/cpu_errata.c | 9 ++++
arch/arm64/lib/copy_template.S | 87 +++++++++++++++++++++++----------
arch/arm64/tools/cpucaps | 1 +
include/linux/extable.h | 2 +
kernel/extable.c | 8 ++-
7 files changed, 105 insertions(+), 27 deletions(-)
--
2.40.1
Powered by blists - more mailing lists