linux-kernel - Re: x86/mce merge, integration hickup + crash, design thoughts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081227225102.GA17822@elte.hu>
Date:	Sat, 27 Dec 2008 23:51:02 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Andi Kleen <ak@...ux.intel.com>,
	Thomas Gleixner <tglx@...utronix.de>
Cc:	linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: x86/mce merge, integration hickup + crash, design thoughts


[ resend - i have restored the Cc: list ]

* Andi Kleen <ak@...ux.intel.com> wrote:

>> We really need to get rid of /dev/mcelog. It's a quirky binary logging 
>> facility not available on 32-bit on current kernels and it has a couple 
>> of limitations:
>
> It's a bit more complicated than it looks first. I'm not in principle 
> opposed to an ASCII interface, but there are some complications in 
> practice. I'll write up all the details tomorrow too
>
> (just some quick comments below, but there's more to it)
>
>> A far more useful design for handling MCE events would be to feed them 
>> into printk logging.
>
> If there's ASCII logging it should be separate from normal printk.

Well, why? Sure, MCE exceptions themselves cannot generally printk 
[because they are in essence non-maskable contexts] unless they get a 
fatal MCE [in which case we have no other choice but to try to printk and 
hope for the best]. But they can sprintf into a buffer which then gets 
printk-d (or passed to whatever ASCII based facility).

'struct mce' is pointless complexity and a pointless restriction - and so 
is /dev/mcelog.

> I started that originally because I was sick of machine checks getting 
> reported as kernel bugs, and I got a lot of feedback over the years that 
> people like that.  Later on it turned out there are more good reasons to 
> separate logging.

Hm, such as? Right now i see mcelog as a facility that gets used only in 
the rarest of circumstances. 99% of the time mcelog is just used in mcelog 
--ascii mode to decode something quirky that the kernel could have (and 
should have) printed out in a much more human-accessible format.

>> So instead of printing such rather cryptic error messages:
>>
>>    MCE 0
>>    HARDWARE ERROR. This is *NOT* a software problem!
>>    Please contact your hardware vendor
>>    CPU 0 BANK 6 MISC 202d ADDR ffeef740
>>    This is not a software problem!
>>    Run through mcelog --ascii to decode and contact your hardware vendor
>>
>> and expecting people to run mcelog, we should print plain-text 
>> something like:
>>
>>    MCE 0
>>    HARDWARE ERROR. This is *NOT* a software problem!
>>    Please contact your hardware vendor
>>    CPU 1 4 northbridge TSC 89a560bb249
>>    ADDR 1dfa49690
>>      Northbridge Chipkill ECC error
>
> It turns out that users don't really find this more enlightening (most 
> users have no clue what a Northbridge is).  They think it's some kind of 
> kernel bug even with the HARDWARE ERROR header.

You should not assume that administrators/users reading kernel crash 
messages are dumb. (an ordinary user wont see it most of the time anyway) 

The usage patterns i see is that admins who get an MCE crash often fail to 
write down the whole MCE message (not realizing that it is important) and 
have to go back and reproduce the MCE crash once again before they can get 
any meaningful information.

Printing out cryptic hexadecimal error codes, requiring people to write 
them down and decode them in user-space is the technology of the 80s - i 
didnt think i'd have to argue too much about this ;-)

Reducing the amount of information presented to the user in such a crash 
situation is a dumb idea. (especially here where the MCE information is 
rather dense and single-screen anyway - so there's no screen real estate 
considerations.)

> The only people who really care about the micro architectural details in 
> full are chip developers, and those typically decode using other methods 
> anyways.

People do care about getting meaningful crash information from the kernel. 
That's why we by default print out something like:

Call Trace:
 [<c013463d>] warn_slowpath+0x6d/0x90
 [<c07a03fb>] ? _spin_lock_irqsave+0x1b/0x60
 [<c07a07fc>] ? _spin_unlock_irqrestore+0x3c/0x60
 [<c07a07fc>] ? _spin_unlock_irqrestore+0x3c/0x60
 [<c015547b>] ? trace_hardirqs_off+0xb/0x10
 [<c079e77c>] ? __mutex_unlock_slowpath+0x9c/0x170
 [<c015fc1c>] smp_call_function_mask+0x1ac/0x1c0

and not:

Call Trace:
 [<c013463d>]
 [<c07a03fb>]
 [<c07a07fc>]
 [<c07a07fc>]
 [<c015547b>]
 [<c079e77c>]
 [<c015fc1c>]

This is a basic principle in the Linux kernel. We try to print out as 
useful information as possible - and only cut down on it if the 
information physically does not fit on the screen. (which is not a problem 
here)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/