[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c20149f35be104c0aa8e995b0f3c7727e095323a.camel@intel.com>
Date: Wed, 25 Sep 2024 05:20:41 +0000
From: "Zhang, Rui" <rui.zhang@...el.com>
To: "ricardo.neri-calderon@...ux.intel.com"
<ricardo.neri-calderon@...ux.intel.com>, "gregkh@...uxfoundation.org"
<gregkh@...uxfoundation.org>
CC: "regressions@...mhuis.info" <regressions@...mhuis.info>, "Neri, Ricardo"
<ricardo.neri@...el.com>, "dave.hansen@...ux.intel.com"
<dave.hansen@...ux.intel.com>, "bp@...en8.de" <bp@...en8.de>, "Gupta, Pawan
Kumar" <pawan.kumar.gupta@...el.com>, "regressions@...ts.linux.dev"
<regressions@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "Luck, Tony" <tony.luck@...el.com>,
"thomas.lindroth@...il.com" <thomas.lindroth@...il.com>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [STABLE REGRESSION] Possible missing backport of x86_match_cpu()
change in v6.1.96
On Mon, 2024-09-23 at 19:45 -0700, Ricardo Neri wrote:
> On Thu, Sep 19, 2024 at 01:19:27PM +0200,
> gregkh@...uxfoundation.org wrote:
> > On Wed, Sep 18, 2024 at 06:54:33AM +0000, Zhang, Rui wrote:
> > > On Mon, 2024-08-12 at 14:11 +0200, Greg KH wrote:
> > > > On Wed, Aug 07, 2024 at 10:15:23AM +0200, Thorsten Leemhuis
> > > > wrote:
> > > > > [CCing the x86 folks, Greg, and the regressions list]
> > > > >
> > > > > Hi, Thorsten here, the Linux kernel's regression tracker.
> > > > >
> > > > > On 30.07.24 18:41, Thomas Lindroth wrote:
> > > > > > I upgraded from kernel 6.1.94 to 6.1.99 on one of my
> > > > > > machines and
> > > > > > noticed that
> > > > > > the dmesg line "Incomplete global flushes, disabling PCID"
> > > > > > had
> > > > > > disappeared from
> > > > > > the log.
> > > > >
> > > > > Thomas, thx for the report. FWIW, mainline developers like
> > > > > the x86
> > > > > folks
> > > > > or Tony are free to focus on mainline and leave
> > > > > stable/longterm
> > > > > series
> > > > > to other people -- some nevertheless help out regularly or
> > > > > occasionally.
> > > > > So with a bit of luck this mail will make one of them care
> > > > > enough
> > > > > to
> > > > > provide a 6.1 version of what you afaics called the "existing
> > > > > fix"
> > > > > in
> > > > > mainline (2eda374e883ad2 ("x86/mm: Switch to new Intel CPU
> > > > > model
> > > > > defines") [v6.10-rc1]) that seems to be missing in 6.1.y. But
> > > > > if
> > > > > not I
> > > > > suspect it might be up to you to prepare and submit a 6.1.y
> > > > > variant
> > > > > of
> > > > > that fix, as you seem to care and are able to test the patch.
> > > >
> > > > Needs to go to 6.6.y first, right? But even then, it does not
> > > > apply
> > > > to
> > > > 6.1.y cleanly, so someone needs to send a backported (and
> > > > tested)
> > > > series
> > > > to us at stable@...r.kernel.org and we will be glad to queue
> > > > them up
> > > > then.
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > >
> > > There are three commits involved.
> > >
> > > commit A:
> > > 4db64279bc2b (""x86/cpu: Switch to new Intel CPU model
> > > defines"")
> > > This commit replaces
> > > X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */
> > > with
> > > X86_MATCH_VFM(INTEL_ANY, 1), /* SNC */
> > > This is a functional change because the family info is
> > > replaced with
> > > 0. And this exposes a x86_match_cpu() problem that it breaks when
> > > the
> > > vendor/family/model/stepping/feature fields are all zeros.
> > >
> > > commit B:
> > > 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just
> > > X86_VENDOR_INTEL")
> > > It addresses the x86_match_cpu() problem by introducing a
> > > valid flag
> > > and set the flag in the Intel CPU model defines.
> > > This fixes commit A, but it actually breaks the x86_cpu_id
> > > structures that are constructed without using the Intel CPU model
> > > defines, like arch/x86/mm/init.c.
> > >
> > > commit C:
> > > 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines")
> > > arch/x86/mm/init.c: broke by commit B but fixed by using the
> > > new
> > > Intel CPU model defines
> > >
> > > In 6.1.99,
> > > commit A is missing
> > > commit B is there
> > > commit C is missing
> > >
> > > In 6.6.50,
> > > commit A is missing
> > > commit B is there
> > > commit C is missing
> > >
> > > Now we can fix the problem in stable kernel, by converting
> > > arch/x86/mm/init.c to use the CPU model defines (even the old
> > > style
> > > ones). But before that, I'm wondering if we need to backport
> > > commit B
> > > in 6.1 and 6.6 stable kernel because only commit A can expose
> > > this
> > > problem.
> >
> > If so, can you submit the needed backports for us to apply? That's
> > the
> > easiest way for us to take them, thanks.
>
> I audited all the uses of x86_match_cpu(match). All callers that
> construct
> the `match` argument using the family of X86_MATCH_* macros from
> arch/x86/
> include/asm/cpu_device_id.h function correctly because the commit B
> has
> been backported to v6.1.99 and to v6.6.50 -- 93022482b294 ("x86/cpu:
> Fix
> x86_match_cpu() to match just X86_VENDOR_INTEL").
>
> Only those callers that use their own thing to compose the `match`
> argument
> are buggy:
> * arch/x86/mm/init.c
> * drivers/powercap/intel_rapl_msr.c (only in 6.1.99)
Thanks for auditing this. I overlooked the intel_rapl driver case.
>
> Summarizing, v6.1.99 needs these two commits from mainline
> * d05b5e0baf42 ("powercap: RAPL: fix invalid initialization for
> pl4_supported field")
> * 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines")
>
> v6.6.50 only needs the second commit.
Well, commit B 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match
just X86_VENDOR_INTEL") is backported to all stable kernels. And the
above two broken cases are also there.
So I suppose we need to backport all of them to 5.x stable kernel as
well.
thanks,
rui
>
> I will submit these backports.
>
> Thanks and BR,
> Ricardo
Powered by blists - more mailing lists