linux-kernel - [DISCUSSION] x86: In-Kernel Use of Extended General-Purpose Registers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251124213227.123779-1-chang.seok.bae@intel.com>
Date: Mon, 24 Nov 2025 21:32:23 +0000
From: "Chang S. Bae" <chang.seok.bae@...el.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org,
	tglx@...utronix.de,
	mingo@...hat.com,
	bp@...en8.de,
	dave.hansen@...ux.intel.com,
	chang.seok.bae@...el.com
Subject: [DISCUSSION] x86: In-Kernel Use of Extended General-Purpose Registers

Hi all,

I’d like to initiate a discussion on this topic. The attached patchset
is *not* intended for upstream now. Instead, its purpose is simply to
serve as an example of how the kernel might use these registers. Beyond
a quick look, it will be likely wasting your time if deeply reviewing the
attached patches.

== Background ==

Advanced Performance Extensions (APX) introduces additional GPRs: R16–R32
(EGPRs) [1]. These EGPRs are accessible via new prefix encodings on
legacy instructions. Their state is handled through XSAVE, and support
for this new XSTATE component was merged in v6.16 [2]. So far, APX is
primarily targeted toward userspace enablement.

However, in-kernel use still needs to be explored. Ingo previously noted
that EGPRs may help reduce kernel stack pressue [3], and this topic comes
up in the x86 microconference at LPC [4]. I hope this posting can
circulate some thoughts along with an example ahead.

== Possible Approaches ==

(1) Selective and Limited Use

This follows how vector registers are used today in places like crypto
routines. AVX state usage is bracketed by kernel_fpu_begin() /
kernel_fpu_end(). EGPRs could be similarly used in a small bounded
region.

Under this model:

  * No changes are needed to the existing XSTATE management API.

  * Preemption and softirqs would be disabled while EGPRs are live,
    subsequently limiting usage to small regions.

  * This lends itself mostly to hand-written assembly, which is less
    scalable for broader adoption.

PATCH3 in the attached set shows an example of this kind usage.

(2) Broader or Tree-wide Adoption

If the goal is to substantially reduce stack pressure or improve
performance more broadly, EGPR usage would need to expand to larger
regions. This raises some considerations:

  * The usage window would become too large to keep preemption disabled.
    In that case, the wrapper-based approach becomes infeasible.

  * The EGPR state would then need to be switched on entry to ensure a
    clean separation as APX usage becomes more pervasive. This could be
    handled by extending struct pt_regs or another structure.

  * The kernel must be able to select between legacy mode and APX,
    since APX remains optional for backward compatibility. Conversely,
    APX-only kernel image won't be distributed.

  * This suggests some level of code duplication or alternate code paths
    as an unavoidable trade-off. As the usage grows, so does image size,
    which raises the bar for demonstrating a measurable benefit.

  * At that scale, adoption will likely rely on compiler support. Their
    code-generation and optimization behavior need to be examined and
    ensured in advance.

== Discussions ==

Given the above, a staged adoption may make sense. EGPR usage could
begin in self-contained libraries or performance-critical paths, being
evaluted incrementally as hardware becomes more broadly available.

Now here are some questions to discuss preliminary:

  * Does this overall framing make sense?
  * Are there alternative or more pragmatic approaches for adoption?
  * Which kernel subsystems or hot paths might benefit most from early
    experimentation with EGPRs?

Thanks,
Chang

[1] https://cdrdv2.intel.com/v1/dl/getContent/784266
[2] https://lore.kernel.org/lkml/aDL35MA4vH0wQ6Gb@gmail.com/
[3] https://lore.kernel.org/lkml/Z8C57rzRt90obAFg@gmail.com/
[4] https://lpc.events/event/19/contributions/2028/

Chang S. Bae (3):
  x86/lib: Refactor csum_partial_copy_generic() into a macro
  x86/lib: Convert repeated asm sequences in checksum copy into macros
  x86/lib: Use EGPRs in 64-bit checksum copy loop

 arch/x86/Kconfig                   |   6 +
 arch/x86/Kconfig.assembler         |   6 +
 arch/x86/include/asm/checksum_64.h |  24 ++-
 arch/x86/lib/csum-copy_64.S        | 282 +++++++++++++++++------------
 4 files changed, 206 insertions(+), 112 deletions(-)

base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d
-- 
2.51.0