[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210107070724.GC14697@zn.tnic>
Date:   Thu, 7 Jan 2021 08:07:24 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        linux-edac@...r.kernel.org, tony.luck@...el.com,
        tglx@...utronix.de, mingo@...hat.com, hpa@...or.com,
        kernel-team@...com
Subject: Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs
On Wed, Jan 06, 2021 at 11:13:53AM -0800, Paul E. McKenney wrote:
> Not yet, it isn't!  Well, except in -rcu.  ;-)
Of course it is - saying "This commit" in this commit's commit message
is very much a tautology. :-)
> You are suggesting dropping mce_missing_cpus and just doing this?
> 
> if (!cpumask_andnot(&mce_present_cpus, cpu_online_mask, &mce_present_cpus))
Yes.
And pls don't call it "holdout CPUs" and change the order so that it is
more user-friendly (yap, you don't need __func__ either):
[   78.946153] mce: Not all CPUs (24-47,120-143) entered the broadcast exception handler.
[   78.946153] Kernel panic - not syncing: Timeout: MCA synchronization.
or so.
And that's fine if it appears twice as long as it is the same info - the
MCA code is one complex mess so you can probably guess why I'd like to
have new stuff added to it be as simplistic as possible.
> I was worried (perhaps unnecessarily) about the possibility of CPUs
> checking in during the printout operation, which would set rather than
> clear the bit.  But perhaps the possible false positives that Tony points
> out make this race not worth worrying about.
> 
> Thoughts?
Yah, apparently, it is not going to be a precise report as you wanted it
to be but at least it'll tell you which *sockets* you can rule out, if
not cores.
:-)
-- 
Regards/Gruss,
    Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists
 
