linux-kernel - Re: [PATCH -v10 0/4] Lock-less list

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D38BCFA.8050609@google.com>
Date:	Thu, 20 Jan 2011 14:53:46 -0800
From:	Mike Waychison <mikew@...gle.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	huang ying <huang.ying.caritas@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Huang Ying <ying.huang@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Andi Kleen <andi@...stfloor.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Chris Mason <chris.mason@...cle.com>,
	Borislav Petkov <bp@...en8.de>,
	Robert Lippert <rlippert@...gle.com>
Subject: Re: [PATCH -v10 0/4] Lock-less list

On 01/20/11 05:06, Ingo Molnar wrote:
>
> * huang ying<huang.ying.caritas@...il.com>  wrote:
>
>> On Thu, Jan 20, 2011 at 8:14 PM, Ingo Molnar<mingo@...e.hu>  wrote:
>>>
>>> * huang ying<huang.ying.caritas@...il.com>  wrote:
>>>
>>>>> But will all that stuff be accepted? Please stop sending infrastructure bits and
>>>>> focus on your larger RAS picture, once you have consensus on that from all
>>>>> parties involved, then, and only then, does it make sense to submit everything,
>>>>> including infrastructure.
>>>>
>>>> I am not sending hardware error reporting infrastructure.  As far as I know, Linus
>>>> and Andrew suggest to use printk for hardware error reporting.  And now, I just
>>>> try to write APEI driver and reporting hardware error with printk.  Is it
>>>> acceptable?  Do you have some other idea about hardware error reporting?
>>>
>>> Erm, how could you possible have missed the perf based RAS daemon work of Boris,
>>> which we've pointed out about half a dozen times already?
>>
>> Even if there is some other hardware error reporting infrastructure
>> such as perf based, I think we still need printk too. After all, as
>> Linus pointed out, printk is the most popular error reporting
>> mechanism so far. Do you think so?
>
> Of course, that's why the upstream EDAC code uses printk too. In fact it does all
> sorts of in-kernel decoding to make the printk output more useful - the /dev/mcelog
> method of pushing all decoding to user-space is fundamentally flawed.

Geez, I don't know how to approach this preposition in a concise way :( 
  Processing machine checks in-kernel is just as flawed as relying on 
/dev/mcelog alone IMO.  I agree with you that relying on /dev/mcelog to 
get all of our error data out is flawed, but so is relying on an 
in-kernel "abstraction" of the data exposed from the hardware.


There are many different ways a system can fail such that an MCE isn't 
received and processed by the kernel.  Sometimes the error is just too 
fatal to do anything useful.  Errors like a NB buffer CRC error, a bus 
syncflood, or a cache hierarchy ECC error that was incorrectly 
propagated up through to the L1 (which may only have parity checking) 
can cause the kernel to fall over as the CPU is either cut off from the 
rest of the world or too confused to get anything right.

Getting at this information is still very worthwhile however, and I'm 
guessing that this is what the APEI bits are meant to be doing.  You'll 
be seeing patches for Google firmware drivers that provide 
functionality along the same vein in the coming days (I'm still busy 
whitewashing and documenting them).

It's also very ignorant to assume that the kernel knows everything about 
the system and is capable of decoding errors to the satisfaction of 
userland.  As Duncan Laurie pointed out 
(https://lkml.org/lkml/2011/1/11/390) we care about not only the 
physical address, but which stick and which dimm *chip* on the stick is 
having problems.  In-kernel abstractions  break down due to the following:

    * The kernel couldn't possible know how my i2c busses are setup and 
the SPD EEPROMs are related to the physical memory abstraction that the 
bios sets up for me.  I don't know of any standard way to have the BIOS 
expose this sort of information to the operating system.  This sort of 
layout changes between motherboard spins quite frequently as well, so 
good luck mapping it yourself in any generic way.

    * The kernel couldn't know how to map SPD JEDEC Manufacturer ID, 
Model part number and revision to anything useful about the chips 
themselves.

    * The kernel also couldn't know how to communicate with the AMBs in 
a meaningful way (if present).


At the end of the day,   The only things I really care about are:

    * I don't care if the kernel pre-processes the data it gets from the 
hardware when there is an error.  For most users, burping something out 
to the logs in decoded form is generally useful.  It isn't for us.
    * Don't ever put the kernel in a position where it will spam the 
logs and wedge the system -- even if the hardware is wonky.
    * Don't dummy the data such that I can't do the same calculations 
with better visibility from userland.
    * Don't ever enforce a reactive policy that can't be changed from 
userland.
    * I don't care whether the data comes from netlink, /dev/mcelog, 
whiz-bang-sysfs uevent, or thingamaboo perfevents doohickie: as long as 
I get events that are both atomic+consistent and the ABI is maintained.

I've CCed Robert who owns our userland bits as he may have something to add.

That said, I'd love to have generic NMI-safe data-passing for improved 
debugability, regardless of this conflated bickering about RAS 
infrastructure :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/