[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251124213227.123779-1-chang.seok.bae@intel.com>
Date: Mon, 24 Nov 2025 21:32:23 +0000
From: "Chang S. Bae" <chang.seok.bae@...el.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org,
tglx@...utronix.de,
mingo@...hat.com,
bp@...en8.de,
dave.hansen@...ux.intel.com,
chang.seok.bae@...el.com
Subject: [DISCUSSION] x86: In-Kernel Use of Extended General-Purpose Registers
Hi all,
I’d like to initiate a discussion on this topic. The attached patchset
is *not* intended for upstream now. Instead, its purpose is simply to
serve as an example of how the kernel might use these registers. Beyond
a quick look, it will be likely wasting your time if deeply reviewing the
attached patches.
== Background ==
Advanced Performance Extensions (APX) introduces additional GPRs: R16–R32
(EGPRs) [1]. These EGPRs are accessible via new prefix encodings on
legacy instructions. Their state is handled through XSAVE, and support
for this new XSTATE component was merged in v6.16 [2]. So far, APX is
primarily targeted toward userspace enablement.
However, in-kernel use still needs to be explored. Ingo previously noted
that EGPRs may help reduce kernel stack pressue [3], and this topic comes
up in the x86 microconference at LPC [4]. I hope this posting can
circulate some thoughts along with an example ahead.
== Possible Approaches ==
(1) Selective and Limited Use
This follows how vector registers are used today in places like crypto
routines. AVX state usage is bracketed by kernel_fpu_begin() /
kernel_fpu_end(). EGPRs could be similarly used in a small bounded
region.
Under this model:
* No changes are needed to the existing XSTATE management API.
* Preemption and softirqs would be disabled while EGPRs are live,
subsequently limiting usage to small regions.
* This lends itself mostly to hand-written assembly, which is less
scalable for broader adoption.
PATCH3 in the attached set shows an example of this kind usage.
(2) Broader or Tree-wide Adoption
If the goal is to substantially reduce stack pressure or improve
performance more broadly, EGPR usage would need to expand to larger
regions. This raises some considerations:
* The usage window would become too large to keep preemption disabled.
In that case, the wrapper-based approach becomes infeasible.
* The EGPR state would then need to be switched on entry to ensure a
clean separation as APX usage becomes more pervasive. This could be
handled by extending struct pt_regs or another structure.
* The kernel must be able to select between legacy mode and APX,
since APX remains optional for backward compatibility. Conversely,
APX-only kernel image won't be distributed.
* This suggests some level of code duplication or alternate code paths
as an unavoidable trade-off. As the usage grows, so does image size,
which raises the bar for demonstrating a measurable benefit.
* At that scale, adoption will likely rely on compiler support. Their
code-generation and optimization behavior need to be examined and
ensured in advance.
== Discussions ==
Given the above, a staged adoption may make sense. EGPR usage could
begin in self-contained libraries or performance-critical paths, being
evaluted incrementally as hardware becomes more broadly available.
Now here are some questions to discuss preliminary:
* Does this overall framing make sense?
* Are there alternative or more pragmatic approaches for adoption?
* Which kernel subsystems or hot paths might benefit most from early
experimentation with EGPRs?
Thanks,
Chang
[1] https://cdrdv2.intel.com/v1/dl/getContent/784266
[2] https://lore.kernel.org/lkml/aDL35MA4vH0wQ6Gb@gmail.com/
[3] https://lore.kernel.org/lkml/Z8C57rzRt90obAFg@gmail.com/
[4] https://lpc.events/event/19/contributions/2028/
Chang S. Bae (3):
x86/lib: Refactor csum_partial_copy_generic() into a macro
x86/lib: Convert repeated asm sequences in checksum copy into macros
x86/lib: Use EGPRs in 64-bit checksum copy loop
arch/x86/Kconfig | 6 +
arch/x86/Kconfig.assembler | 6 +
arch/x86/include/asm/checksum_64.h | 24 ++-
arch/x86/lib/csum-copy_64.S | 282 +++++++++++++++++------------
4 files changed, 206 insertions(+), 112 deletions(-)
base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d
--
2.51.0
Powered by blists - more mailing lists