[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DA71774.9020900@redhat.com>
Date: Thu, 14 Apr 2011 11:49:08 -0400
From: Prarit Bhargava <prarit@...hat.com>
To: Borislav Petkov <bp@...64.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Russ Anderson <rja@....com>,
"Luck, Tony" <tony.luck@...el.com>,
"dzickus@...hat.com" <dzickus@...hat.com>,
"mstowe@...hat.com" <mstowe@...hat.com>,
"dnelson@...hat.com" <dnelson@...hat.com>,
"rja@...ricas.sgi.com" <rja@...ricas.sgi.com>
Subject: Re: [PATCH -v3] x86, MCE: Drop the default decoding notifier
On 04/14/2011 11:44 AM, Borislav Petkov wrote:
> On Thu, Apr 14, 2011 at 11:23:04AM -0400, Prarit Bhargava wrote:
>
>> Oops ... I may have confused you because what I did was subtle. I
>> really should have explicitly pointed out what I did. Sorry, my bad.
>>
>> From my patch (sorry for the cut-and-paste):
>>
>> @@ -239,7 +227,10 @@ static void print_mce(struct mce *m)
>> * Print out human-readable details about the MCE error,
>> * (if the CPU has an implementation for that)
>> */
>> - atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
>> + ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
>> + if (ret != NOTIFY_STOP && (m->status & MCI_STATUS_UC))
>> + pr_emerg(HW_ERR "Run the above through 'mcelog --ascii' "
>> + "to decode.\n");
>> }
>>
>> This, of course, only outputs during UCs.
>>
>> and
>>
>> @@ -289,6 +280,8 @@ static void mce_panic(char *msg, struct mce *final,
>> char *exp)
>> continue;
>> if (!(m->status & MCI_STATUS_UC)) {
>> print_mce(m);
>> + printk_once(KERN_EMERG HW_ERR "MCE Corrected
>> Error(s) "
>> + "detected.");
>> if (!apei_err)
>> apei_err = apei_write_mce(m);
>> }
>>
>> so we'll print "MCE Corrected Error(s)" _once_ if we go through this
>> path. Since there is no data to decode with mcelog, a nice little one
>> time message is probably the way to go :).
>>
> Ok, first of all, see the print_mce(m) call above? Yes, we're dumping
> full CE MCE info in this case because they were unlogged and as such,
> that info can be decoded.
>
> But this whole point is moot since those errors can be only 32 max _and_
> on the _panic_ path. And I don't think this path matters because it is
> _very_ seldom. I bet you don't hit it on any of your machines.
>
Ohhhh ... I was running on the assumption that the data was *never* output.
> And we don't want to fix that - we want to fix the case with the
> occasional CE MCEs which get detected in the polling path but none of
> their MCA regs get dumped for decoding so the decoding hint there is
> out of place. And we fixed that at least partially so that it doesn't
> flood the logs. If you're not fine with the default ratelimit of 10 msgs
> per 5 seconds we can always raise the ratelimit but tweaking an almost
> hypothetical case is just not worth it.
>
>
Okay -- I'm good then.
P.
> Thanks.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists