lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250320234301.8342-1-chang.seok.bae@intel.com>
Date: Thu, 20 Mar 2025 16:42:51 -0700
From: "Chang S. Bae" <chang.seok.bae@...el.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org,
	tglx@...utronix.de,
	mingo@...hat.com,
	bp@...en8.de,
	dave.hansen@...ux.intel.com,
	chang.seok.bae@...el.com
Subject: [PATCH RFC v2 0/9] x86: Support Intel Advanced Performance Extensions

Hi all,

This is a revision of the previous posting [1]. The primary goal at this
stage is to present the scope of the code changes and still gather early
feedback on the approach to enabling this feature for userspace.

== Changes in this revision ==

A major update in this version is that the xstate ordering table is now
dynamically populated at boot time, thanks to Dave’s suggestion. Each
system will now generate its own table based on the available xfeature
set.

As a result, the MPX removal patch has been dropped from this series.
Previously, a conflict arose when listing both MPX and APX in a static
table. With the new dynamic approach, each feature can be enabled
exclusively without conflict. Patch 7 ensures their mutual exclusivity,
meaning the table will now reflect either one or neither of them.

Removing MPX entirely also seemed premature, given previous discussions
on the list [4]. For guest compatibility, setting MPX bits in XCR0 is
still expected on older systems.

Additional changes since the last posting:
*  Complete MPX feature bits for enforcing APX exclusivity (Dave).
*  Remove the first patch, as merged into the tip tree:
   69a2fdf44604 ("x86/fpu/xstate: Simplify print_xstate_features()")

Below is the original cover letter with minor updates:

== Introduction ==

APX introduces a new set of general-purpose registers designed to improve
performance. More details on its use cases can be found in the published
documentation [2].

This series primarily assumes the userspace usage only, though in-kernel
use cases may remain open for considerations.

== Points to Consider ==

Considering APX enabling by the kernel,

* New Register State

  APX register state is managed by the XSAVE instruction set, with its
  state component number assigned to 19, following XTILEDATA.

* XSAVE Buffer Offset

  - In the compacted format used (for the in-kernel buffer), the APX
    state will appear at a later position in the buffer.

  - In the non-compacted format (used for signal, ptrace, and KVM ABIs),
    APX is assigned a lower offset, occupying the space previously
    reserved for the deprecated MPX state.

    This should not bring any ABI changes. In the extended register state
    area, each state's size and offset are determined dynamically via
    CPUID.

* Kernel Assumptions:

  The kernel generally assumes that higher-numbered components have
  higher offsets, as discussed in the following section.

== Areas for Adjustment ==

With that in mind, here are a few key areas that need to be addressed to
properly handle the APX state. While these are thought to be pretty much
at this stage, I'd say it is still open to discover possible hindsights
in other areas.

1. Feature Conflict

   While a valid CPU should not expose both MPX and APX, a broken or
   misconfigured CPU could erroneously expose support for both. This hard
   conflict should be avoided up front.

2. XSAVE Format Conversion

   APX introduces an offset anomaly that requires special handling when
   converting between compacted and non-compacted layouts. The kernel
   relies on XSAVE instructions for context switching and signal
   handling. However, xstate copy functions must correctly translate the
   XSAVE format while ensuring sequential memory access, as enforced by
   struct membuf.

3. XSAVE Size Calculation

   The kernel calculates XSAVE buffer sizes based on the assumption that
   the highest-numbered feature appears at the end. If APX is the last
   bit set in the feature mask, the existing logic in
   xstate_calculate_size() miscalculates the buffer size.

4. Offset Sanity Check

   The kernel's boot-time sanity check in setup_xstate_cache() assumes
   offsets increase with feature numbers. The APX state will conflict
   with this assumption.

== Approaches ==

These two approaches are considerable:

Option 1, Consider APX as a one-off exception

  Initially, I thought this approach would result in fewer changes to the
  xstate code. However, it makes the code less comprehensive (or more
  complicated) and introduces a risk. If another feature similar to APX
  comes up, adding another exception seems to be inefficient and messy.

Option 2, Handle out-of-order offsets in general

  Rather than treating APX as an exception, this approach adapts the
  kernel to accommodate out-of-order offsets. By introducing a feature
  order table and an accompanying macro, the traversal logic is
  encapsulated cleanly, simplifying related code. This makes the kernel
  more resilient to future features with non-sequential offsets.

== Series Summary ==

This patch set addresses the above issues before enabling APX support
The chosen approach (Option 2) adjusts the kernel to handle out-of-order
offsets. Here’s a breakdown of the patches:

* PART1: XSTATE Code Adjustment

  - PATCH1:  Remove the offset sanity check.
  - PATCH2:  Introduce the feature order table.
  - PATCH3:  Adjust XSAVE size calculation.
  - PATCH4:  Modify the xstate copy function.

* PART2: APX Enabling

  - PATCH5:  Enumerate the APX CPUID feature bit.
  - PATCH6:  Update xstate definitions to include APX.
  - PATCH7:  Ensure MPX and APX are mutually exclusive.
  - PATCH8:  Register APX in the supported xstate list.
  - PATCH9:  Add a self-test case for APX.

== Testing ==

The first part is agnostic to APX and should preserve the same
functionality for the currently supported features. The xstate tests --
covering context switching and ABI compatibility for signal and ptrace --
were executed to ensure there is no regression.

Now, with dynamic population of the ordering table, MPX and APX states
can be separately enabled in different systems:

* APX: PATCH9 applies to the same test set and builds on the xstate
  selftest rework [3]. Since no hardware implementation is available at
  this time, an internal Intel emulator was primarily used to verify the
  test cases.

* MPX exposure in KVM guest was also validated from a Skylake machine.

---

The patches are based on the tip/master branch. The series can be found
here:
    git://github.com/intel/apx.git apx_rfc-v2

Thanks,
Chang

[1] https://lore.kernel.org/lkml/20250227184502.10288-1-chang.seok.bae@intel.com/
[2] https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
[3] https://lore.kernel.org/lkml/20250226010731.2456-1-chang.seok.bae@intel.com/
[4] https://lore.kernel.org/lkml/547a1203-0339-7ad2-9505-eab027046298@intel.com/

Chang S. Bae (9):
  x86/fpu/xstate: Remove xstate offset check
  x86/fpu/xstate: Introduce xfeature order table and accessor macro
  x86/fpu/xstate: Adjust XSAVE buffer size calculation
  x86/fpu/xstate: Adjust xstate copying logic for user ABI
  x86/cpufeatures: Add X86_FEATURE_APX
  x86/fpu/apx: Define APX state component
  x86/fpu/apx: Disallow conflicting MPX presence
  x86/fpu/apx: Enable APX state support
  selftests/x86/apx: Add APX test

 arch/x86/include/asm/cpufeatures.h   |   1 +
 arch/x86/include/asm/fpu/types.h     |   9 +++
 arch/x86/include/asm/fpu/xstate.h    |   3 +-
 arch/x86/kernel/cpu/cpuid-deps.c     |   1 +
 arch/x86/kernel/cpu/scattered.c      |   1 +
 arch/x86/kernel/fpu/xstate.c         | 115 +++++++++++++++++++--------
 tools/testing/selftests/x86/Makefile |   3 +-
 tools/testing/selftests/x86/apx.c    |  10 +++
 tools/testing/selftests/x86/xstate.c |   3 +-
 tools/testing/selftests/x86/xstate.h |   2 +
 10 files changed, 113 insertions(+), 35 deletions(-)
 create mode 100644 tools/testing/selftests/x86/apx.c


base-commit: 758ea9705c51865858c612f591c6e6950dcafccf
-- 
2.45.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ