[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110413071432.GA22773@liondog.tnic>
Date: Wed, 13 Apr 2011 09:14:33 +0200
From: Borislav Petkov <bp@...en8.de>
To: Russ Anderson <rja@....com>
Cc: Prarit Bhargava <prarit@...hat.com>,
"Luck, Tony" <tony.luck@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"dzickus@...hat.com" <dzickus@...hat.com>,
"mstowe@...hat.com" <mstowe@...hat.com>,
"dnelson@...hat.com" <dnelson@...hat.com>, rja@...ricas.sgi.com
Subject: Re: [PATCH]: mce: don't print "human readable" message for
corrected errors
On Tue, Apr 12, 2011 at 10:00:34PM -0500, Russ Anderson wrote:
> > I'm thinking remove the TAINT for CEs and don't call the default
> > notifier if it is the only notifier call registered. Maybe something like
> >
> > if (num_notifiers(&x86_mce_decoder_chain) > 1)
> > atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, &m);
> >
> > or since the notifiers are priority sorted, don't call notifiers with -1
> > prio.
> >
> > Or something to that effect.
>
> What is the point of having a default notifier if it doesn't get called?
Well, the thing is, we call the same notifier chain in the event of
both correctable and uncorrectable errors. But to be honest, if we shut
up the default pr_emerg() calls with "no human readable.." in the CEs'
case, then we don't really need it in the UEs' case either, IMO.
The only half-way sensible info we print is
"Run the message through 'mcelog --ascii' to decode.\n"
on UE because there we print MCA regs to the console where mcelog can
actually decode them, and this should be a hint to the user to do so.
In the CEs case, no such info comes out and there's no need for those
printks to flood the logs.
So maybe we could drop the default notifier and do in print_mce():
if (notifier_chain_empty(&x86_mce_decoder_chain))
pr_emerg(HW_ERR "Run the message through 'mcelog --ascii' to decode.\n");
I think this could work, let me cook up something.
> Any consideration of adding thresholding (ie only log the first X number
> of corrected errors) as is done on IA64? (see arch/ia64/kernel/mca.c)
Yep, this is in the works with a RAS daemon that should collect all
error info in userspace and do all policies there. We might even drop
the spitting in the system logs almost completely and I know this'll
make a lot of people happy :).
--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists