[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251030-mte-tighten-tco-v1-0-88c92e7529d9@os.amperecomputing.com>
Date: Thu, 30 Oct 2025 20:49:30 -0700
From: Carl Worth <carl@...amperecomputing.com>
To: Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
Taehyun Noh <taehyun@...xas.edu>, Carl Worth <carl@...amperecomputing.com>
Subject: [PATCH 0/2] arm64: mte: Improve performance by tightening handling
of PSTATE.TCO
[Thanks to Taehyun Noh from UT Austin for originally reporting this
bug. In this cover letter, "we" refers to a collaborative effort
between indiviuals at both Ampere Computing and UT Austin.]
We measured severe performance overhead (30-50%) when enabling
userspace MTE and running memcached on an AmpereOne machine, (detailed
benchmark results are provided below).
We identified excessive tag checking taking place in the kernel,
(though only userspace tag checking was requested), as the culprit for
the performance slowdown. The existing code enables tag checking (by
_disabling_ PSTATE.TCO: ("tag check override")) at kernel entry
regardless of whether it's kernel-side MTE (via KASAN_HW_TAGS) or
userspace MTE that is being requested.
This patch series addresses the slowdown (in the case that only
userspace MTE is requested) by deferring the enabling of tag checking
until the kernel is about to access userspace memory, that is enabling
tag checking in user_access_begin and then disabling it again in
user_access_end.
The effect of this patch series is most-readily seen by using perf to
count tag-checked accesses in both kernel and userspace, for example
while runnning "perf bench futex hash" with MTE enabled.
Prior to the patch series, we see:
# GLIBC_TUNABLES=glibc.mem.tagging=3 perf stat -e mem_access_checked_rd:u,mem_access_checked_wr:u,mem_access_checked_rd:k,mem_access_checked_wr:k perf bench futex hash
...
Performance counter stats for 'perf bench futex hash':
4,046,872,020 mem_access_checked_rd:u
23,580 mem_access_checked_wr:u
251,248,813,102 mem_access_checked_rd:k
87,256,021,241 mem_access_checked_wr:k
And after the patch series we see (for the same command):
Performance counter stats for 'perf bench futex hash':
3,866,346,822 mem_access_checked_rd:u
23,499 mem_access_checked_wr:u
7,725,072,314 mem_access_checked_rd:k
424 mem_access_checked_wr:k
As can be seen above, with roughly equivalent counts of userspace
tag-checked accesses, over 97% of the kernel-space tag-checked
accesses are eliminated.
As to performance, the patch series has been observed as having no
impact for workloads with MTE disabled.
For workloads with MTE enabled, we measured the series causing a 5-8%
slowdown for "perf bench futex hash". Presumably, this results from
code paths that now include 2 writes to PSTATE.TCO where previously
there was only 1. Given that this is a synthetic micro-benchmark, we
argue that this performance slowdown is acceptable given the results
with more realistic workloads as described below.
We used the Phoronix Test Suite pts/memcached benchmark with a
get-heavy workload (1:10 Set:Get ratio) which is where the slowdown
appears most clearly. The slowdown worsens with increased core count,
levelling out above 32 cores. The numbers below are based on averages
from 50 runs each, with 96 cores on each run. For "MTE on",
GLIBC_TUNABLES was set to "glibc.mem.tagging=3". For "MTE off",
GLIBC_TUNABLES was unset.
The numbers below are normalized ops./sec. (higher is better),
normalized to the baseline case (unpatched kernel, MTE off).
Before the patch series (unpatched v6.18-rc1):
MTE off: 1.000
MTE on: 0.455
MTE overhead: 54.5% +/ 2.3%
After applying this patch series:
MTE off: 0.997
MTE on: 1.002
MTE overhead: No difference proven at 95.0% confidence
Changes since v1:
* Reorded patches to put cleanup patch before performance fix.
Signed-off-by: Carl Worth <carl@...amperecomputing.com>
---
Carl Worth (1):
arm64: mte: Defer disabling of TCO until user_access_begin/end
Taehyun Noh (1):
arm64: mte: Unify kernel MTE policy and manipulation of TCO
arch/arm64/include/asm/mte.h | 53 +++++++++++++++++++++++++++++++---------
arch/arm64/include/asm/uaccess.h | 32 +++++++++++++++++++++++-
arch/arm64/kernel/entry-common.c | 4 +--
arch/arm64/kernel/mte.c | 2 +-
4 files changed, 76 insertions(+), 15 deletions(-)
---
base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
Powered by blists - more mailing lists