lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 13 Jan 2009 06:02:24 +0100
From:	Andi Kleen <ak@...ux.intel.com>
To:	Tim Hockin <thockin@...il.com>
CC:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>,
	priyankag@...gle.com, Aaron Durbin <adurbin@...il.com>,
	Duncan Laurie <dlaurie@...gle.com>
Subject: Re: x86/mce merge, integration hickup + crash, design thoughts

Tim Hockin wrote:

> 
>>  - it squeezes all MCE errors from the whole system into a small,
>>    32-entry ringbuffer.
> 
> Yes 32 is small, but not really a big problem in practice.  The MCE daemon
> (http://mcedaemon.googlecode.com)

Interesting.

> goes a long way towards making this
> a non-issue.  Every distro should ship mced.

The latest mcelog version (not yet released) also supports a daemon
mode btw. But the limited buffer has been also fixed already
here and replaced with more flexible per CPU buffers.

>>  - it puts all the MCE logging info into an intermediary binary log
>>    record format: 'struct mce' - just for userspace to in essence
>>    printf out those entries with minimal post-processing. The fact that
>>    we squeeze all information into a fixed-size binary record makes it
>>    hard to extend and complicates the code needlessly.
> 
> This is not true, we do EXTENSIVE post-processing of MCEs.  Yes it is hard
> to extend, but that's not the same as saying it is useless.

It's not hard to extend at all (I did it several times)
Adding fields is extremly simple and fully forwards and
backwards compatible. That works because mcelog asks
the kernel about the record size and just ignores any
excess data.

The only thing that cannot be done is removing old fields, but
that is difficult with ASCII parsers too (text parsers tend
to bail out when they can't find some field they expect)

They can be obsoleted fine however (happened with cpu -> extcpu)

> 
>>  - these design aspects are also quite harmful to usability: by
>>    default all MCEs are fatal currently (pre-Nehalem anyway), so
>>    /dev/mcelog will only be used if a user goes out on a limb to
>>    configure it and sets the tolerant flag.
> 
> Not true.  The vast vast vast majority of MCEs are corrected, as per our
> experience, and I suspect we've got more experience with MCEs than just
> about any other consumer.

My employer has a lot of experience with MCEs too...  And it matches your
experience.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ