lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241001161042.465584-1-chang.seok.bae@intel.com>
Date: Tue,  1 Oct 2024 09:10:35 -0700
From: "Chang S. Bae" <chang.seok.bae@...el.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org,
	tglx@...utronix.de,
	mingo@...hat.com,
	bp@...en8.de,
	dave.hansen@...ux.intel.com,
	chang.seok.bae@...el.com
Subject: [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature

Hi all,

I'd like to ask initial feedback on this series enabling the staging
feature. Thanks!

== Latency Spike Issue ==

As microcode images have increased in size, a corresponding rise in load
latency has become inevitable. This latency spike significantly impacts
late loading, which remains in use despite the cautions highlighted in
the documentation [1]. The issue is especially critical for continuously
running workloads and virtual machines, where excessive delays can lead
to timeouts.

== Staging for Latency Reduction ==

Currently, writing to MSR_IA32_UCODE_WRITE triggers the entire update
process -- loading, validating, and activation -- all of which contribute
to the latency during CPU halt. The staging feature mitigates this by
refactoring all but the activation step out of the critical path,
allowing CPUs to continue serving workloads while staging takes place.

== Cache Flush Removal ==

Before resolving this latency spike caused by larger images, another
major latency issue -- cache invalidation [2] -- must first be addressed.
Originally introduced to handle a specific erratum, this cache
invalidation is now unnecessary because the problematic microcode images
have been banned. This cache flush has been found to negate the benefits
of staging, so this patch series begins by removing the WRINVD
instruction.

== Validation ==

We internally established pseudocode to clearly define all essential
steps for interacting with the firmware. Any firmware implementation
supporting staging should adhere to this contract. This patch set
incorporates that staging logic, which I successfully tested on one
firmware implementation. Multiple teams at Intel have also validated the
feature across different implementations.

Preliminary results from a pre-production system show a significant
reduction in latency (about 40%) with the staging approach alone.
Further improvements are possible with additional optimizations [*].

== Call for Review ==

This RFC series aims to present the proposed approach for community
review, to assess its soundness, and to discuss potential alternatives
if necessary. There are several key points to highlight for feedback:

  1. Staging Integration Approach

     In the core code, the high-level sequence for late loading is:

     (1) request_microcode_fw(), and
     (2) load_late_stop_cpus()->apply_microcode()

     Staging doesn't fit neatly into either steps, as it involves the
     loading process but not the activation. Therefore, a new callback is
     introduced:

       core::load_late_locked()
       -> intel::staging_microcode()
          -> intel_staging::staging_work()
             -> intel_staging::...

  2. Code Abstraction

     The newly added intel_staging.c file contains all staging-related
     code to keep it self-contained. Ideally, the entire firmware
     interaction could eventually be abstracted into a single MSR write,
     which remains a long-term goal. Fortunately, recent protocol
     simplifications have made this more feasible.

  3. Staging Policy (TODO)

     While staging is always attempted, the system will fall back to the
     legacy update method if staging fails. There is an open question
     regarding staging policy: should it be mandatory, without fallback,
     in certain usage scenarios? This could lead further refinements in
     the flow depending on feedback and use cases.

  4. Specification Updates

     Recent specification updates have simplified the staging protocol
     and clarified the behavior of MSR_IA32_UCODE_WRITE in conjunction
     with staging:

     4.1. Protocol Simplification

     The specification update [3] has significantly reduced the
     complexity of staging code, trimming the kernel code from ~1K lines
     in preliminary implementations. Thanks to Dave for guiding this
     redesign effort.

     4.2. Clarification of Legacy Update Behavior

     Chapter 5 of the specification adds further clarification on
     MSR_IA32_UCODE_WRITE. Key points are summarized below:

     (a) When staging is not performed or failed, a WRMSR will still load
     the patch image, but with higher latency.

     (b) During an active staging process, MSR_IA32_UCODE_WRITE can
     load a new microcode image, again with higher latency.

     (c) If the versions differ between the staged microcode and the
     version loaded via MSR_IA32_UCODE_WRITE, the version loaded through
     the MSR takes precedence.

     I'd also make sure there is no further ambiguity in this documentation
     [3]. Feel free to provide feedback if anything seems unclear or
     unreasonable.

As noted [*], an additional series focused on further latency
optimizations will follow. However, the staging approach was prioritized
due to its significant first-order impact on latency.

This series is based on 6.12-rc1. You can also find it from this repo:
    git://github.com/intel-staging/microcode.git staging_rfc-v1

Thanks,
Chang

[1]: https://docs.kernel.org/arch/x86/microcode.html#why-is-late-loading-dangerous
[2]: https://lore.kernel.org/all/20240701212012.21499-1-chang.seok.bae@intel.com/
[3]: https://cdrdv2.intel.com/v1/dl/getContent/782715
[*]: Further latency improvements will be addressed in the upcoming
     ‘Uniform’ feature series.

Chang S. Bae (7):
  x86/microcode/intel: Remove unnecessary cache writeback and
    invalidation
  x86/microcode: Introduce staging option to reduce late-loading latency
  x86/msr-index: Define MSR index and bit for the microcode staging
    feature
  x86/microcode/intel: Prepare for microcode staging
  x86/microcode/intel_staging: Implement staging logic
  x86/microcode/intel_staging: Support mailbox data transfer
  x86/microcode/intel: Enable staging when available

 arch/x86/include/asm/msr-index.h              |   9 +
 arch/x86/kernel/cpu/microcode/Makefile        |   2 +-
 arch/x86/kernel/cpu/microcode/core.c          |  12 +-
 arch/x86/kernel/cpu/microcode/intel.c         |  77 ++++++++-
 arch/x86/kernel/cpu/microcode/intel_staging.c | 154 ++++++++++++++++++
 arch/x86/kernel/cpu/microcode/internal.h      |   5 +-
 6 files changed, 247 insertions(+), 12 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/microcode/intel_staging.c

-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ