linux-kernel - Re: [PATCH v2 0/4] x86/mtrr: Allow MTRR updates on multiple CPUs in parallel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50f830e0-f69a-4f8c-a041-6ab6a8ee1cd8@suse.com>
Date: Tue, 10 Feb 2026 07:51:04 +0100
From: Jürgen Groß <jgross@...e.com>
To: "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
 x86@...nel.org
Cc: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH v2 0/4] x86/mtrr: Allow MTRR updates on multiple CPUs in
 parallel

On 09.02.26 19:37, H. Peter Anvin wrote:
> On February 9, 2026 1:12:59 AM PST, Juergen Gross <jgross@...e.com> wrote:
>> Ping?
>>
>> I'd really like to have this in 7.0, as it is fixing a real issue on
>> some machines ...
>>
>>
>> Juergen
>>
>> On 30.01.26 12:36, Juergen Gross wrote:
>>> Today MTRR updates are serialized to not happen on multiple CPUs at the
>>> same time, as the related coding is using global variables.
>>>
>>> On huge machines with lots of CPUs this can result in problems, as such
>>> updates are happening through stop_machine(), which will call the MTRR
>>> update function with interrupts off on all CPUs at the same time. The
>>> interrupts will be switched on only after the last CPU has finished
>>> the MTRR update. As the update is required to run in uncached mode, it
>>> can take easily several milliseconds on each CPU, resulting in the
>>> whole process to need several seconds. This in turn can cause the
>>> watchdog to trigger and to recognize a hard system lockup.
>>>
>>> This series is changing the behavior by allowing the MTRR update to
>>> happen on all CPUs in parallel.
>>>
>>> Changes in V2:
>>> - fix a function comment header in patch 2
>>>
>>> Juergen Gross (4):
>>>     x86/mtrr: Move cache_enable() and cache_disable() to mtrr/generic.c
>>>     x86/mtrr: Introduce MTRR work state structure
>>>     x86/mtrr: Add a prepare_set hook to mtrr_ops
>>>     x86/mtrr: Drop cache_disable_lock
>>>
>>>    arch/x86/include/asm/cacheinfo.h   |   2 -
>>>    arch/x86/include/asm/mtrr.h        |   2 -
>>>    arch/x86/kernel/cpu/cacheinfo.c    |  80 +----------------
>>>    arch/x86/kernel/cpu/mtrr/generic.c | 139 ++++++++++++++++++++++++-----
>>>    arch/x86/kernel/cpu/mtrr/mtrr.c    |   3 +
>>>    arch/x86/kernel/cpu/mtrr/mtrr.h    |   2 +
>>>    6 files changed, 122 insertions(+), 106 deletions(-)
>>>
>>
> 
> First of all, what machines are even needing MTRR updates these days?

I'm not aware this machine really needed an update.

> This isn't a rhetorical question. It is important to understand what the underlying problem is.

It just took several seconds for all CPUs to check if there is an update
needed. It might be an issue with firmware, topology, whatever. It happened
in a test doing 300 cold boots in a row after roughly 70 loop iterations,
always on one of the last CPUs.

The issue shows that there IS a potential problem with doing the MTRR
update one CPU after the other, instead just doing it in parallel (which
is the "official" recommendation anyway). See the comment in
cache_disable(). And it isn't as if the fix would be very complicated.

Juergen

Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3684 bytes)

Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (496 bytes)