[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151208091812.GA27180@pd.tnic>
Date: Tue, 8 Dec 2015 10:18:13 +0100
From: Borislav Petkov <bp@...en8.de>
To: "Raj, Ashok" <ashok.raj@...el.com>
Cc: "Luck, Tony" <tony.luck@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>
Subject: Re: [Patch V2] x86, mce: Ensure offline CPU's don't participate in
mce rendezvous process.
On Mon, Dec 07, 2015 at 08:41:43PM -0500, Raj, Ashok wrote:
> On Tue, Dec 08, 2015 at 12:25:24AM +0100, Borislav Petkov wrote:
> >
> > Did you miss my statement in my previous mail where I said that the MCE
> > is being raised only on the cores of node 0?
> >
>
> That's right.. but i think if MCE is only given to node0, then the system
> would panic eveytime with or without the patch. which is why i got confused.
>
> I somehow misunderstood that with this patch the system didn't panic.
No, the system did panic in both times. The "strange" observation is
that the MCE gets reported only on the cores on node 0. Or at least only
the printks from mce_panic() on the cores on node0 reach the serial
console.
If we really broadcast only on node0, then that would be a problem if
the corrupted data leaves the node and manages to corrupt storage when
written out on some of the other nodes. I'm not sure if the kernel
panicking the whole system is on time and there's not a small window
between the detection and the panicking, in which the corruption might
happen.
If so, this'd defeat the purpose of MCE broadcasting but I'm just
hypothesizing here.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists