[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081227225102.GA17822@elte.hu>
Date: Sat, 27 Dec 2008 23:51:02 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Andi Kleen <ak@...ux.intel.com>,
Thomas Gleixner <tglx@...utronix.de>
Cc: linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: x86/mce merge, integration hickup + crash, design thoughts
[ resend - i have restored the Cc: list ]
* Andi Kleen <ak@...ux.intel.com> wrote:
>> We really need to get rid of /dev/mcelog. It's a quirky binary logging
>> facility not available on 32-bit on current kernels and it has a couple
>> of limitations:
>
> It's a bit more complicated than it looks first. I'm not in principle
> opposed to an ASCII interface, but there are some complications in
> practice. I'll write up all the details tomorrow too
>
> (just some quick comments below, but there's more to it)
>
>> A far more useful design for handling MCE events would be to feed them
>> into printk logging.
>
> If there's ASCII logging it should be separate from normal printk.
Well, why? Sure, MCE exceptions themselves cannot generally printk
[because they are in essence non-maskable contexts] unless they get a
fatal MCE [in which case we have no other choice but to try to printk and
hope for the best]. But they can sprintf into a buffer which then gets
printk-d (or passed to whatever ASCII based facility).
'struct mce' is pointless complexity and a pointless restriction - and so
is /dev/mcelog.
> I started that originally because I was sick of machine checks getting
> reported as kernel bugs, and I got a lot of feedback over the years that
> people like that. Later on it turned out there are more good reasons to
> separate logging.
Hm, such as? Right now i see mcelog as a facility that gets used only in
the rarest of circumstances. 99% of the time mcelog is just used in mcelog
--ascii mode to decode something quirky that the kernel could have (and
should have) printed out in a much more human-accessible format.
>> So instead of printing such rather cryptic error messages:
>>
>> MCE 0
>> HARDWARE ERROR. This is *NOT* a software problem!
>> Please contact your hardware vendor
>> CPU 0 BANK 6 MISC 202d ADDR ffeef740
>> This is not a software problem!
>> Run through mcelog --ascii to decode and contact your hardware vendor
>>
>> and expecting people to run mcelog, we should print plain-text
>> something like:
>>
>> MCE 0
>> HARDWARE ERROR. This is *NOT* a software problem!
>> Please contact your hardware vendor
>> CPU 1 4 northbridge TSC 89a560bb249
>> ADDR 1dfa49690
>> Northbridge Chipkill ECC error
>
> It turns out that users don't really find this more enlightening (most
> users have no clue what a Northbridge is). They think it's some kind of
> kernel bug even with the HARDWARE ERROR header.
You should not assume that administrators/users reading kernel crash
messages are dumb. (an ordinary user wont see it most of the time anyway)
The usage patterns i see is that admins who get an MCE crash often fail to
write down the whole MCE message (not realizing that it is important) and
have to go back and reproduce the MCE crash once again before they can get
any meaningful information.
Printing out cryptic hexadecimal error codes, requiring people to write
them down and decode them in user-space is the technology of the 80s - i
didnt think i'd have to argue too much about this ;-)
Reducing the amount of information presented to the user in such a crash
situation is a dumb idea. (especially here where the MCE information is
rather dense and single-screen anyway - so there's no screen real estate
considerations.)
> The only people who really care about the micro architectural details in
> full are chip developers, and those typically decode using other methods
> anyways.
People do care about getting meaningful crash information from the kernel.
That's why we by default print out something like:
Call Trace:
[<c013463d>] warn_slowpath+0x6d/0x90
[<c07a03fb>] ? _spin_lock_irqsave+0x1b/0x60
[<c07a07fc>] ? _spin_unlock_irqrestore+0x3c/0x60
[<c07a07fc>] ? _spin_unlock_irqrestore+0x3c/0x60
[<c015547b>] ? trace_hardirqs_off+0xb/0x10
[<c079e77c>] ? __mutex_unlock_slowpath+0x9c/0x170
[<c015fc1c>] smp_call_function_mask+0x1ac/0x1c0
and not:
Call Trace:
[<c013463d>]
[<c07a03fb>]
[<c07a07fc>]
[<c07a07fc>]
[<c015547b>]
[<c079e77c>]
[<c015fc1c>]
This is a basic principle in the Linux kernel. We try to print out as
useful information as possible - and only cut down on it if the
information physically does not fit on the screen. (which is not a problem
here)
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists