lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 14 Apr 2011 17:44:05 +0200
From:	Borislav Petkov <bp@...64.org>
To:	Prarit Bhargava <prarit@...hat.com>
Cc:	Borislav Petkov <bp@...64.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Russ Anderson <rja@....com>,
	"Luck, Tony" <tony.luck@...el.com>,
	"dzickus@...hat.com" <dzickus@...hat.com>,
	"mstowe@...hat.com" <mstowe@...hat.com>,
	"dnelson@...hat.com" <dnelson@...hat.com>,
	"rja@...ricas.sgi.com" <rja@...ricas.sgi.com>
Subject: Re: [PATCH -v3] x86, MCE: Drop the default decoding notifier

On Thu, Apr 14, 2011 at 11:23:04AM -0400, Prarit Bhargava wrote:
> Oops ... I may have confused you because what I did was subtle.  I
> really should have explicitly pointed out what I did.  Sorry, my bad.
> 
> From my patch (sorry for the cut-and-paste):
> 
> @@ -239,7 +227,10 @@ static void print_mce(struct mce *m)
>          * Print out human-readable details about the MCE error,
>          * (if the CPU has an implementation for that)
>          */
> -       atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> +       ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> +       if (ret != NOTIFY_STOP && (m->status & MCI_STATUS_UC))
> +               pr_emerg(HW_ERR "Run the above through 'mcelog --ascii' "
> +                        "to decode.\n");
>  }
>  
> This, of course, only outputs during UCs.
> 
> and
> 
> @@ -289,6 +280,8 @@ static void mce_panic(char *msg, struct mce *final,
> char *exp)
>                         continue;
>                 if (!(m->status & MCI_STATUS_UC)) {
>                         print_mce(m);
> +                       printk_once(KERN_EMERG HW_ERR "MCE Corrected
> Error(s) "
> +                                   "detected.");
>                         if (!apei_err)
>                                 apei_err = apei_write_mce(m);
>                 }
> 
> so we'll print "MCE Corrected Error(s)" _once_ if we go through this
> path.  Since there is no data to decode with mcelog, a nice little one
> time message is probably the way to go :).

Ok, first of all, see the print_mce(m) call above? Yes, we're dumping
full CE MCE info in this case because they were unlogged and as such,
that info can be decoded.

But this whole point is moot since those errors can be only 32 max _and_
on the _panic_ path. And I don't think this path matters because it is
_very_ seldom. I bet you don't hit it on any of your machines.

And we don't want to fix that - we want to fix the case with the
occasional CE MCEs which get detected in the polling path but none of
their MCA regs get dumped for decoding so the decoding hint there is
out of place. And we fixed that at least partially so that it doesn't
flood the logs. If you're not fine with the default ratelimit of 10 msgs
per 5 seconds we can always raise the ratelimit but tweaking an almost
hypothetical case is just not worth it.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ