lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 5 Nov 2010 14:46:58 +0100
From:	Borislav Petkov <bp@...64.org>
To:	Mauro Carvalho Chehab <mchehab@...radead.org>
Cc:	"acme@...radead.org" <acme@...radead.org>,
	"fweisbec@...il.com" <fweisbec@...il.com>,
	"mingo@...e.hu" <mingo@...e.hu>,
	"peterz@...radead.org" <peterz@...radead.org>,
	"rostedt@...dmis.org" <rostedt@...dmis.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 00/20] RAS daemon v3

On Fri, Nov 05, 2010 at 08:02:34AM -0400, Mauro Carvalho Chehab wrote:
> I tried to apply your patches here, but they didn't apply. i suspect
> that Steven added some patches there at the meantime, as two patches
> on your series are already on his tree. IMO, the better would be if
> you could create a temporary tree or branch to allow us to better view
> it.

Sure:

git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git ras-v3

> This example looks quite ugly to me. I doubt anyone without a
> datasheet and after a very careful inspection would know what
> 0x9c00410000010016 magic number means.

Right, this was only a hands-on example of what otherwise a script does.
I wanted to show what happens in detail.

> I suspect that writing a wrong magic number will also produce a
> completely undesired result.

That's not a problem since this is software-only injection. It actually
makes sense to be able to inject crap so that you can test the decoding
code:

[81953.494078] [Hardware Error]: MC5_STATUS: Uncorrected error, other errors lost: no, CPU context corrupt: yes, UECC Error
[81953.505714] [Hardware Error]: Corrupted FR MCE info?
[81953.505718] [Hardware Error]: Transaction: GEN (GEN), no timeout, Cache Level: L3/GEN, Participating Processor: GEN

> So, the better it to keep the MCE code
> internally to the driver.
> 
> Also, writing a magic number to a node named as "status" seems weird to me.
> 
> IMO, instead, it should be something like:
> 
> echo 1 >/sys/devices/system/edac/mce/error_inject

Well, this way you inject a random error. But you want to control the
error types which you inject and set not only one but a couple of the
MCi_ bank MSRs. In that manner, you can inject the address at which a
certain MCE happens and so on.

So, basically, the long term goal is to have a tool which could do all
that. Maybe something like this:

perf inject --mce --functional-unit DC --uncorrectable --pcc-corrupt --virtual-address 0xdeadbeef ...

or

perf inject --mce --functional-unit IC --random --correctable --ecc

(I have long options so that it's clear what we do - we can make them
shorter in the actual case.) But you get the idea. This way, you can
inject all kinds of stuff and also in a human-readable form.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ