lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 10 Feb 2007 15:43:14 +0100
From:	Andi Kleen <ak@...e.de>
To:	Alan <alan@...rguk.ukuu.org.uk>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: -mm merge plans for 2.6.21

On Saturday 10 February 2007 14:48, Alan wrote:
> > Well it's a technical issue -- it conflicts with the machine check
> > code that is already implemented by stealing away its events.
> > You would first need a CONFIG_MCE=n on x86-64 at least, otherwise
> > one of them will be unhappy.
> 
> That is a fair point, albeit after far too long sitting stuck in the tree.

I'm pretty sure I mentioned it earlier already.

> I'll have a look into this a bit further, probably MCE_K8 should feed off
> the output of the MCE driver.

You could probably write a simple edac driver that just feeds of the events 
mce.c generates and presents that as the EDAC drivers. Currently 
the user space device is the only consumer, but it wouldn't be hard
to full it off to other kernel users.

Possibly keeping the existing code that reads the DIMMs and 
then reads the address map of the NB and match that, then you
get the same output.

[However see below -- i think matching it to SMBIOS using the address
is a lot more useful for the user in the end]

> 
> > The other issue is that the existing code does everything EDAC
> > K8 does already perfectly fine, just in a small fraction of 
> > the code.
> 
> Which is false. It provided a totally inconsistent interface to user
> space, while the EDAC layer provides the consistency many people need and
> want.

I still don't get that argument because unlike mce.c+mcelog EDAC cannot
actually tell you where the DIMM that failed is located on your motherboard.

As far as i can make it out it's only useful for people who either
have full schematics of their motherboard and know how to read them
or did a full try'n'error of all combinations run once to 
figure out which channel is located where.

Somehow I cannot imagine either of that is very common in "enterprises" ;-)

Ok one argument was that some LinuxBIOS users don't seem to 
set up these tables and have the necessary information at hand
for their custom cluster hardware, but that's still a weird uncommon 
case and it would be far better if they just fixed LB.

> Also unless it has changed remarkably the MCE driver doesn't do 
> scrubbing which is needed in software on some chip revisions.

True, adding that might be a good idea.

[Although I guess that could be done with a small user utility
using /dev/mem though if you really wanted it?]

One thing EDAC also does that mcelog doesn't is decode the ECC 
syndromes -- but I haven't figured out a use case yet where that
is actually useful to know afterwards.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ