[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241211014213.3671-1-chang.seok.bae@intel.com>
Date: Tue, 10 Dec 2024 17:42:06 -0800
From: "Chang S. Bae" <chang.seok.bae@...el.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org,
tglx@...utronix.de,
mingo@...hat.com,
bp@...en8.de,
dave.hansen@...ux.intel.com,
chang.seok.bae@...el.com
Subject: [PATCH 0/6] x86/microcode: Support for Intel Staging Feature
Hi all,
Changes since the RFC posting [1]:
* Simplified the staging address discovery code. Leveraging the staging
topology, stage only if package id changes (Thomas).
* Cleaned up the MSR read logic (Boris and Dave).
* Renamed functions to align with the do_something naming convention
(Boris).
* Polished staging result messages (Boris).
* Dropped the WBINVD removal as mainlined now.
This series is based on 6.13-rc2. You can also find it from this repo:
git://github.com/intel-staging/microcode.git staging_v1
I would appreciate further reviews and feedback.
Thanks,
Chang
---
Here is the original cover letter with minor updates -- removing the
WBINVD story and updating the function names:
== Latency Spike Issue ==
As microcode images have increased in size, a corresponding rise in load
latency has become inevitable. This latency spike significantly impacts
late loading, which remains in use despite the cautions highlighted in
the documentation [2]. The issue is especially critical for continuously
running workloads and virtual machines, where excessive delays can lead
to timeouts.
== Staging for Latency Reduction ==
Currently, writing to MSR_IA32_UCODE_WRITE triggers the entire update
process -- loading, validating, and activation -- all of which contribute
to the latency during CPU halt. The staging feature mitigates this by
refactoring all but the activation step out of the critical path,
allowing CPUs to continue serving workloads while staging takes place.
== Validation ==
We internally established pseudocode to clearly define all essential
steps for interacting with the firmware. Any firmware implementation
supporting staging should adhere to this contract. This patch set
incorporates that staging logic, which I successfully tested on one
firmware implementation. Multiple teams at Intel have also validated the
feature across different implementations.
Preliminary results from a pre-production system show a significant
reduction in latency (about 40%) with the staging approach alone.
Further improvements are possible with additional optimizations [*].
== Call for Review ==
Here are several key points to highlight for feedback:
1. Staging Integration Approach:
In the core code, the high-level sequence for late loading is:
(1) request_microcode_fw(), and
(2) load_late_stop_cpus()->apply_microcode()
Staging doesn't fit neatly into either steps, as it involves the
loading process but not the activation. Therefore, a new callback is
introduced:
core::load_late_locked()
-> intel::staging_microcode()
-> intel_staging::do_stage()
2. Code Abstraction:
The newly added intel_staging.c file contains all staging-related
code to keep it self-contained. Ideally, the entire firmware
interaction could eventually be abstracted into a single MSR write,
which remains a long-term goal. Fortunately, recent protocol
simplifications have made this more feasible.
3. Staging Policy (TODO):
While staging is always attempted, the system will fall back to the
legacy update method if staging fails. There is an open question
regarding staging policy: should it be mandatory, without fallback,
in certain usage scenarios? This could lead further refinements in
the flow depending on feedback and use cases.
4. Specification Updates
Recent specification updates have simplified the staging protocol
and clarified the behavior of MSR_IA32_UCODE_WRITE in conjunction
with staging:
4.1. Protocol Simplification
The specification update [3] has significantly reduced the
complexity of staging code, trimming the kernel code from ~1K lines
in preliminary implementations. Thanks to Dave for guiding this
redesign effort.
4.2. Clarification of Legacy Update Behavior
Chapter 5 of the specification adds further clarification on
MSR_IA32_UCODE_WRITE. Key points are summarized below:
(a) When staging is not performed or failed, a WRMSR will still load
the patch image, but with higher latency.
(b) During an active staging process, MSR_IA32_UCODE_WRITE can
load a new microcode image, again with higher latency.
(c) If the versions differ between the staged microcode and the
version loaded via MSR_IA32_UCODE_WRITE, the version loaded through
the MSR takes precedence.
I'd also make sure there is no further ambiguity in this documentation
[3]. Feel free to provide feedback if anything seems unclear or
unreasonable.
As noted [*], an additional series focused on further latency
optimizations will follow. However, the staging approach was prioritized
due to its significant first-order impact on latency.
[1]: https://lore.kernel.org/all/20241001161042.465584-1-chang.seok.bae@intel.com/
[2]: https://docs.kernel.org/arch/x86/microcode.html#why-is-late-loading-dangerous
[3]: https://cdrdv2.intel.com/v1/dl/getContent/782715
[*]: Further latency improvements will be addressed in the upcoming
‘Uniform’ feature series.
Chang S. Bae (6):
x86/microcode: Introduce staging option to reduce late-loading latency
x86/msr-index: Define MSR index and bit for the microcode staging
feature
x86/microcode/intel: Prepare for microcode staging
x86/microcode/intel_staging: Implement staging logic
x86/microcode/intel_staging: Support mailbox data transfer
x86/microcode/intel: Enable staging when available
arch/x86/include/asm/msr-index.h | 9 ++
arch/x86/kernel/cpu/microcode/Makefile | 2 +-
arch/x86/kernel/cpu/microcode/core.c | 12 +-
arch/x86/kernel/cpu/microcode/intel.c | 53 +++++++
arch/x86/kernel/cpu/microcode/intel_staging.c | 149 ++++++++++++++++++
arch/x86/kernel/cpu/microcode/internal.h | 7 +-
6 files changed, 228 insertions(+), 4 deletions(-)
create mode 100644 arch/x86/kernel/cpu/microcode/intel_staging.c
--
2.45.2
Powered by blists - more mailing lists