[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <a79fa3cc-73ef-4546-b110-1f448480e3e6@leemhuis.info>
Date: Wed, 7 Aug 2024 10:15:23 +0200
From: Thorsten Leemhuis <regressions@...mhuis.info>
To: Thomas Lindroth <thomas.lindroth@...il.com>
Cc: stable@...r.kernel.org, tony.luck@...el.com,
Greg KH <gregkh@...uxfoundation.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"Borislav Petkov (AMD)" <bp@...en8.de>, LKML <linux-kernel@...r.kernel.org>,
Linux kernel regressions list <regressions@...ts.linux.dev>
Subject: Re: [STABLE REGRESSION] Possible missing backport of x86_match_cpu()
change in v6.1.96
[CCing the x86 folks, Greg, and the regressions list]
Hi, Thorsten here, the Linux kernel's regression tracker.
On 30.07.24 18:41, Thomas Lindroth wrote:
> I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and
> noticed that
> the dmesg line "Incomplete global flushes, disabling PCID" had
> disappeared from
> the log.
Thomas, thx for the report. FWIW, mainline developers like the x86 folks
or Tony are free to focus on mainline and leave stable/longterm series
to other people -- some nevertheless help out regularly or occasionally.
So with a bit of luck this mail will make one of them care enough to
provide a 6.1 version of what you afaics called the "existing fix" in
mainline (2eda374e883ad2 ("x86/mm: Switch to new Intel CPU model
defines") [v6.10-rc1]) that seems to be missing in 6.1.y. But if not I
suspect it might be up to you to prepare and submit a 6.1.y variant of
that fix, as you seem to care and are able to test the patch.
Ciao, Thorsten
> That message comes from commit c26b9e193172f48cd0ccc64285337106fb8aa804,
> which
> disables PCID support on some broken hardware in arch/x86/mm/init.c:
>
> #define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
> .family = 6, \
> .model = _model, \
> }
> /*
> * INVLPG may not properly flush Global entries
> * on these CPUs when PCIDs are enabled.
> */
> static const struct x86_cpu_id invlpg_miss_ids[] = {
> INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
> INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
> INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
> INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
> INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
> INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
> {}
>
> ...
>
> if (x86_match_cpu(invlpg_miss_ids)) {
> pr_info("Incomplete global flushes, disabling PCID");
> setup_clear_cpu_cap(X86_FEATURE_PCID);
> return;
> }
>
> arch/x86/mm/init.c, which has that code, hasn't changed in 6.1.94 ->
> 6.1.99.
> However I found a commit changing how x86_match_cpu() behaves in 6.1.96:
>
> commit 8ab1361b2eae44077fef4adea16228d44ffb860c
> Author: Tony Luck <tony.luck@...el.com>
> Date: Mon May 20 15:45:33 2024 -0700
>
> x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
>
> I suspect this broke the PCID disabling code in arch/x86/mm/init.c.
> The commit message says:
>
> "Add a new flags field to struct x86_cpu_id that has a bit set to
> indicate that
> this entry in the array is valid. Update X86_MATCH*() macros to set that
> bit.
> Change the end-marker check in x86_match_cpu() to just check the flags
> field
> for this bit."
>
> But the PCID disabling code in 6.1.99 does not make use of the
> X86_MATCH*() macros; instead, it defines a new INTEL_MATCH() macro
> without the
> X86_CPU_ID_FLAG_ENTRY_VALID flag.
>
> I looked in upstream git and found an existing fix:
> commit 2eda374e883ad297bd9fe575a16c1dc850346075
> Author: Tony Luck <tony.luck@...el.com>
> Date: Wed Apr 24 11:15:18 2024 -0700
>
> x86/mm: Switch to new Intel CPU model defines
>
> New CPU #defines encode vendor and family as well as model.
>
> [ dhansen: vertically align 0's in invlpg_miss_ids[] ]
>
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Signed-off-by: Borislav Petkov (AMD) <bp@...en8.de>
> Link:
> https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com
>
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 679893ea5e68..6b43b6480354 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -261,21 +261,17 @@ static void __init probe_page_size_mask(void)
> }
> }
>
> -#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
> - .family = 6, \
> - .model = _model, \
> - }
> /*
> * INVLPG may not properly flush Global entries
> * on these CPUs when PCIDs are enabled.
> */
> static const struct x86_cpu_id invlpg_miss_ids[] = {
> - INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
> - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
> - INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
> + X86_MATCH_VFM(INTEL_ALDERLAKE, 0),
> + X86_MATCH_VFM(INTEL_ALDERLAKE_L, 0),
> + X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, 0),
> + X86_MATCH_VFM(INTEL_RAPTORLAKE, 0),
> + X86_MATCH_VFM(INTEL_RAPTORLAKE_P, 0),
> + X86_MATCH_VFM(INTEL_RAPTORLAKE_S, 0),
> {}
> };
>
> The fix removed the custom INTEL_MATCH macro and uses the X86_MATCH*()
> macros
> with X86_CPU_ID_FLAG_ENTRY_VALID. This fixed commit was never backported
> to 6.1,
> so it looks like a stable series regression due to a missing backport.
>
> If I apply the fix patch on 6.1.99, the PCID disabling code activates
> again.
> I had to change all the INTEL_* definitions to the old definitions to
> make it
> build:
>
> static const struct x86_cpu_id invlpg_miss_ids[] = {
> - INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
> - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
> - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
> - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
> + X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE, 0),
> + X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_L, 0),
> + X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_N, 0),
> + X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE, 0),
> + X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_P, 0),
> + X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_S, 0),
> {}
> };
>
> I only looked at the code in arch/x86/mm/init.c, so there may be other
> uses of
> x86_match_cpu() in the kernel that are also broken in 6.1.99.
> This email is meant as a bug report, not a pull request. Someone else
> should
> confirm the problem and submit the appropriate fix.
P.S.:
#regzbot ^introduced 8ab1361b2eae44
#regzbot title x86: Possible missing backport of x86_match_cpu() change
#regzbot ignore-activity
Powered by blists - more mailing lists