[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <04FA5DE3-CD35-4D22-8300-8ADADF8E35C9@zytor.com>
Date: Mon, 09 Feb 2026 23:28:06 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Jürgen Groß <jgross@...e.com>,
linux-kernel@...r.kernel.org, x86@...nel.org
CC: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH v2 0/4] x86/mtrr: Allow MTRR updates on multiple CPUs in parallel
On February 9, 2026 10:51:04 PM PST, "Jürgen Groß" <jgross@...e.com> wrote:
>On 09.02.26 19:37, H. Peter Anvin wrote:
>> On February 9, 2026 1:12:59 AM PST, Juergen Gross <jgross@...e.com> wrote:
>>> Ping?
>>>
>>> I'd really like to have this in 7.0, as it is fixing a real issue on
>>> some machines ...
>>>
>>>
>>> Juergen
>>>
>>> On 30.01.26 12:36, Juergen Gross wrote:
>>>> Today MTRR updates are serialized to not happen on multiple CPUs at the
>>>> same time, as the related coding is using global variables.
>>>>
>>>> On huge machines with lots of CPUs this can result in problems, as such
>>>> updates are happening through stop_machine(), which will call the MTRR
>>>> update function with interrupts off on all CPUs at the same time. The
>>>> interrupts will be switched on only after the last CPU has finished
>>>> the MTRR update. As the update is required to run in uncached mode, it
>>>> can take easily several milliseconds on each CPU, resulting in the
>>>> whole process to need several seconds. This in turn can cause the
>>>> watchdog to trigger and to recognize a hard system lockup.
>>>>
>>>> This series is changing the behavior by allowing the MTRR update to
>>>> happen on all CPUs in parallel.
>>>>
>>>> Changes in V2:
>>>> - fix a function comment header in patch 2
>>>>
>>>> Juergen Gross (4):
>>>> x86/mtrr: Move cache_enable() and cache_disable() to mtrr/generic.c
>>>> x86/mtrr: Introduce MTRR work state structure
>>>> x86/mtrr: Add a prepare_set hook to mtrr_ops
>>>> x86/mtrr: Drop cache_disable_lock
>>>>
>>>> arch/x86/include/asm/cacheinfo.h | 2 -
>>>> arch/x86/include/asm/mtrr.h | 2 -
>>>> arch/x86/kernel/cpu/cacheinfo.c | 80 +----------------
>>>> arch/x86/kernel/cpu/mtrr/generic.c | 139 ++++++++++++++++++++++++-----
>>>> arch/x86/kernel/cpu/mtrr/mtrr.c | 3 +
>>>> arch/x86/kernel/cpu/mtrr/mtrr.h | 2 +
>>>> 6 files changed, 122 insertions(+), 106 deletions(-)
>>>>
>>>
>>
>> First of all, what machines are even needing MTRR updates these days?
>
>I'm not aware this machine really needed an update.
>
>> This isn't a rhetorical question. It is important to understand what the underlying problem is.
>
>It just took several seconds for all CPUs to check if there is an update
>needed. It might be an issue with firmware, topology, whatever. It happened
>in a test doing 300 cold boots in a row after roughly 70 loop iterations,
>always on one of the last CPUs.
>
>The issue shows that there IS a potential problem with doing the MTRR
>update one CPU after the other, instead just doing it in parallel (which
>is the "official" recommendation anyway). See the comment in
>cache_disable(). And it isn't as if the fix would be very complicated.
>
>
>Juergen
You are assuming that it won't break any fragile systems. I'm much more concerned about why this is happening at all.
Powered by blists - more mailing lists