lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <04FA5DE3-CD35-4D22-8300-8ADADF8E35C9@zytor.com>
Date: Mon, 09 Feb 2026 23:28:06 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Jürgen Groß <jgross@...e.com>,
        linux-kernel@...r.kernel.org, x86@...nel.org
CC: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH v2 0/4] x86/mtrr: Allow MTRR updates on multiple CPUs in parallel

On February 9, 2026 10:51:04 PM PST, "Jürgen Groß" <jgross@...e.com> wrote:
>On 09.02.26 19:37, H. Peter Anvin wrote:
>> On February 9, 2026 1:12:59 AM PST, Juergen Gross <jgross@...e.com> wrote:
>>> Ping?
>>> 
>>> I'd really like to have this in 7.0, as it is fixing a real issue on
>>> some machines ...
>>> 
>>> 
>>> Juergen
>>> 
>>> On 30.01.26 12:36, Juergen Gross wrote:
>>>> Today MTRR updates are serialized to not happen on multiple CPUs at the
>>>> same time, as the related coding is using global variables.
>>>> 
>>>> On huge machines with lots of CPUs this can result in problems, as such
>>>> updates are happening through stop_machine(), which will call the MTRR
>>>> update function with interrupts off on all CPUs at the same time. The
>>>> interrupts will be switched on only after the last CPU has finished
>>>> the MTRR update. As the update is required to run in uncached mode, it
>>>> can take easily several milliseconds on each CPU, resulting in the
>>>> whole process to need several seconds. This in turn can cause the
>>>> watchdog to trigger and to recognize a hard system lockup.
>>>> 
>>>> This series is changing the behavior by allowing the MTRR update to
>>>> happen on all CPUs in parallel.
>>>> 
>>>> Changes in V2:
>>>> - fix a function comment header in patch 2
>>>> 
>>>> Juergen Gross (4):
>>>>     x86/mtrr: Move cache_enable() and cache_disable() to mtrr/generic.c
>>>>     x86/mtrr: Introduce MTRR work state structure
>>>>     x86/mtrr: Add a prepare_set hook to mtrr_ops
>>>>     x86/mtrr: Drop cache_disable_lock
>>>> 
>>>>    arch/x86/include/asm/cacheinfo.h   |   2 -
>>>>    arch/x86/include/asm/mtrr.h        |   2 -
>>>>    arch/x86/kernel/cpu/cacheinfo.c    |  80 +----------------
>>>>    arch/x86/kernel/cpu/mtrr/generic.c | 139 ++++++++++++++++++++++++-----
>>>>    arch/x86/kernel/cpu/mtrr/mtrr.c    |   3 +
>>>>    arch/x86/kernel/cpu/mtrr/mtrr.h    |   2 +
>>>>    6 files changed, 122 insertions(+), 106 deletions(-)
>>>> 
>>> 
>> 
>> First of all, what machines are even needing MTRR updates these days?
>
>I'm not aware this machine really needed an update.
>
>> This isn't a rhetorical question. It is important to understand what the underlying problem is.
>
>It just took several seconds for all CPUs to check if there is an update
>needed. It might be an issue with firmware, topology, whatever. It happened
>in a test doing 300 cold boots in a row after roughly 70 loop iterations,
>always on one of the last CPUs.
>
>The issue shows that there IS a potential problem with doing the MTRR
>update one CPU after the other, instead just doing it in parallel (which
>is the "official" recommendation anyway). See the comment in
>cache_disable(). And it isn't as if the fix would be very complicated.
>
>
>Juergen

You are assuming that it won't break any fragile systems. I'm much more concerned about why this is happening at all.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ