[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110413173705.GJ2791@aftab>
Date: Wed, 13 Apr 2011 19:37:05 +0200
From: Borislav Petkov <bp@...64.org>
To: Prarit Bhargava <prarit@...hat.com>
Cc: Borislav Petkov <bp@...64.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Russ Anderson <rja@....com>,
"Luck, Tony" <tony.luck@...el.com>,
"dzickus@...hat.com" <dzickus@...hat.com>,
"mstowe@...hat.com" <mstowe@...hat.com>,
"dnelson@...hat.com" <dnelson@...hat.com>,
"rja@...ricas.sgi.com" <rja@...ricas.sgi.com>
Subject: Re: [PATCH -v2] x86, MCE: Drop default decoding notifier
On Wed, Apr 13, 2011 at 01:14:35PM -0400, Prarit Bhargava wrote:
>
>
> On 04/13/2011 01:01 PM, Prarit Bhargava wrote:
> >
> >> @@ -239,7 +227,9 @@ static void print_mce(struct mce *m)
> >> * Print out human-readable details about the MCE error,
> >> * (if the CPU has an implementation for that)
> >> */
> >> - atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> >> + ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> >> + if (ret != NOTIFY_STOP)
> >> + pr_emerg(HW_ERR "Run the above through 'mcelog --ascii' to decode.\n");
> >> }
> >>
> >>
> > Borislav,
> >
> >
>
> Oops. Let me *carefully* rephrase that so it is clear what I'm
> complaining about.
>
> > I still think you need the check for UC here. When an UC occurs and
> > mce_panic() is called the output will include:
> >
> > [Hardware Error]: Run the above through 'mcelog --ascii' to decode.
> >
> > potentially many, many times
>
> for _all_ unreported *correctable* errors.
>
> > . The problem still is that there is no
> > output to decode (in the default case).
> >
> >
>
> ie) (sorry for the cut-and-paste)
>
> /* First print corrected ones that are still unlogged */
> for (i = 0; i < MCE_LOG_LEN; i++) {
> struct mce *m = &mcelog.entry[i];
> if (!(m->status & MCI_STATUS_VAL))
> continue;
> if (!(m->status & MCI_STATUS_UC)) {
> print_mce(m);
> if (!apei_err)
> apei_err = apei_write_mce(m);
> }
> }
>
> will potentially result in many bogus messages during a time at which we
> definitely do not want bogus messages.
I don't think that this is a problem. This is on the panic path and it
is supposed to dump only the _unreported_ CE MCEs queued in the mcelog
which can contain 32 MCEs max.
In the worst case, we will report 32 CEs before panicking. For that case
we either do printk_once as Tony suggested or we ratelimit it. I'll
update the patch.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists