linux-kernel - [PATCH 0/6] x86/microcode: Support for Intel Staging Feature

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241211014213.3671-1-chang.seok.bae@intel.com>
Date: Tue, 10 Dec 2024 17:42:06 -0800
From: "Chang S. Bae" <chang.seok.bae@...el.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org,
	tglx@...utronix.de,
	mingo@...hat.com,
	bp@...en8.de,
	dave.hansen@...ux.intel.com,
	chang.seok.bae@...el.com
Subject: [PATCH 0/6] x86/microcode: Support for Intel Staging Feature

Hi all,

Changes since the RFC posting [1]:

* Simplified the staging address discovery code. Leveraging the staging
  topology, stage only if package id changes (Thomas).

* Cleaned up the MSR read logic (Boris and Dave).

* Renamed functions to align with the do_something naming convention
  (Boris).

* Polished staging result messages (Boris).

* Dropped the WBINVD removal as mainlined now.

This series is based on 6.13-rc2. You can also find it from this repo:
    git://github.com/intel-staging/microcode.git staging_v1

I would appreciate further reviews and feedback.

Thanks,
Chang

---

Here is the original cover letter with minor updates -- removing the
WBINVD story and updating the function names:

== Latency Spike Issue ==

As microcode images have increased in size, a corresponding rise in load
latency has become inevitable. This latency spike significantly impacts
late loading, which remains in use despite the cautions highlighted in
the documentation [2]. The issue is especially critical for continuously
running workloads and virtual machines, where excessive delays can lead
to timeouts.

== Staging for Latency Reduction ==

Currently, writing to MSR_IA32_UCODE_WRITE triggers the entire update
process -- loading, validating, and activation -- all of which contribute
to the latency during CPU halt. The staging feature mitigates this by
refactoring all but the activation step out of the critical path,
allowing CPUs to continue serving workloads while staging takes place.

== Validation ==

We internally established pseudocode to clearly define all essential
steps for interacting with the firmware. Any firmware implementation
supporting staging should adhere to this contract. This patch set
incorporates that staging logic, which I successfully tested on one
firmware implementation. Multiple teams at Intel have also validated the
feature across different implementations.

Preliminary results from a pre-production system show a significant
reduction in latency (about 40%) with the staging approach alone.
Further improvements are possible with additional optimizations [*].

== Call for Review ==

Here are several key points to highlight for feedback:

  1. Staging Integration Approach:

     In the core code, the high-level sequence for late loading is:

     (1) request_microcode_fw(), and
     (2) load_late_stop_cpus()->apply_microcode()

     Staging doesn't fit neatly into either steps, as it involves the
     loading process but not the activation. Therefore, a new callback is
     introduced:

       core::load_late_locked()
       -> intel::staging_microcode()
          -> intel_staging::do_stage()

  2. Code Abstraction:

     The newly added intel_staging.c file contains all staging-related
     code to keep it self-contained. Ideally, the entire firmware
     interaction could eventually be abstracted into a single MSR write,
     which remains a long-term goal. Fortunately, recent protocol
     simplifications have made this more feasible.

  3. Staging Policy (TODO):

     While staging is always attempted, the system will fall back to the
     legacy update method if staging fails. There is an open question
     regarding staging policy: should it be mandatory, without fallback,
     in certain usage scenarios? This could lead further refinements in
     the flow depending on feedback and use cases.

  4. Specification Updates

     Recent specification updates have simplified the staging protocol
     and clarified the behavior of MSR_IA32_UCODE_WRITE in conjunction
     with staging:

     4.1. Protocol Simplification

     The specification update [3] has significantly reduced the
     complexity of staging code, trimming the kernel code from ~1K lines
     in preliminary implementations. Thanks to Dave for guiding this
     redesign effort.

     4.2. Clarification of Legacy Update Behavior

     Chapter 5 of the specification adds further clarification on
     MSR_IA32_UCODE_WRITE. Key points are summarized below:

     (a) When staging is not performed or failed, a WRMSR will still load
     the patch image, but with higher latency.

     (b) During an active staging process, MSR_IA32_UCODE_WRITE can
     load a new microcode image, again with higher latency.

     (c) If the versions differ between the staged microcode and the
     version loaded via MSR_IA32_UCODE_WRITE, the version loaded through
     the MSR takes precedence.

     I'd also make sure there is no further ambiguity in this documentation
     [3]. Feel free to provide feedback if anything seems unclear or
     unreasonable.

As noted [*], an additional series focused on further latency
optimizations will follow. However, the staging approach was prioritized
due to its significant first-order impact on latency.

[1]: https://lore.kernel.org/all/20241001161042.465584-1-chang.seok.bae@intel.com/
[2]: https://docs.kernel.org/arch/x86/microcode.html#why-is-late-loading-dangerous
[3]: https://cdrdv2.intel.com/v1/dl/getContent/782715
[*]: Further latency improvements will be addressed in the upcoming
     ‘Uniform’ feature series.

Chang S. Bae (6):
  x86/microcode: Introduce staging option to reduce late-loading latency
  x86/msr-index: Define MSR index and bit for the microcode staging
    feature
  x86/microcode/intel: Prepare for microcode staging
  x86/microcode/intel_staging: Implement staging logic
  x86/microcode/intel_staging: Support mailbox data transfer
  x86/microcode/intel: Enable staging when available

 arch/x86/include/asm/msr-index.h              |   9 ++
 arch/x86/kernel/cpu/microcode/Makefile        |   2 +-
 arch/x86/kernel/cpu/microcode/core.c          |  12 +-
 arch/x86/kernel/cpu/microcode/intel.c         |  53 +++++++
 arch/x86/kernel/cpu/microcode/intel_staging.c | 149 ++++++++++++++++++
 arch/x86/kernel/cpu/microcode/internal.h      |   7 +-
 6 files changed, 228 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/microcode/intel_staging.c

-- 
2.45.2