[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210107070724.GC14697@zn.tnic>
Date: Thu, 7 Jan 2021 08:07:24 +0100
From: Borislav Petkov <bp@...en8.de>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
linux-edac@...r.kernel.org, tony.luck@...el.com,
tglx@...utronix.de, mingo@...hat.com, hpa@...or.com,
kernel-team@...com
Subject: Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs
On Wed, Jan 06, 2021 at 11:13:53AM -0800, Paul E. McKenney wrote:
> Not yet, it isn't! Well, except in -rcu. ;-)
Of course it is - saying "This commit" in this commit's commit message
is very much a tautology. :-)
> You are suggesting dropping mce_missing_cpus and just doing this?
>
> if (!cpumask_andnot(&mce_present_cpus, cpu_online_mask, &mce_present_cpus))
Yes.
And pls don't call it "holdout CPUs" and change the order so that it is
more user-friendly (yap, you don't need __func__ either):
[ 78.946153] mce: Not all CPUs (24-47,120-143) entered the broadcast exception handler.
[ 78.946153] Kernel panic - not syncing: Timeout: MCA synchronization.
or so.
And that's fine if it appears twice as long as it is the same info - the
MCA code is one complex mess so you can probably guess why I'd like to
have new stuff added to it be as simplistic as possible.
> I was worried (perhaps unnecessarily) about the possibility of CPUs
> checking in during the printout operation, which would set rather than
> clear the bit. But perhaps the possible false positives that Tony points
> out make this race not worth worrying about.
>
> Thoughts?
Yah, apparently, it is not going to be a precise report as you wanted it
to be but at least it'll tell you which *sockets* you can rule out, if
not cores.
:-)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists