lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3908561D78D1C84285E8C5FCA982C28F040115@ORSMSX104.amr.corp.intel.com>
Date:	Wed, 29 Feb 2012 16:58:09 +0000
From:	"Luck, Tony" <tony.luck@...el.com>
To:	Borislav Petkov <bp@...64.org>,
	Mauro Carvalho Chehab <mchehab@...hat.com>
CC:	Ingo Molnar <mingo@...e.hu>,
	EDAC devel <linux-edac@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Borislav Petkov <borislav.petkov@....com>
Subject: RE: [PATCH 1/3] mce: Add a msg string to the MCE tracepoint

> - severity: No real need for it. If the error is severe enough, the
> kernel handles automatically, i.e. memory poisoning and recovery. In all
> the other cases it is not severe enough.

We'll never see fatal errors via the perf/tracepoint (no way the RAS daemon
will run to pull them). But we will see both corrected error chatter and
recovered uncorrectable errors. I would be able to tell these apart.
Corrected errors in small doses are normal and don't require any
action beyond logging so you can see whether there are enough to cross
a threshold and cause alarm. Recovered uncorrectable errors are going
to be much rarer, and I think deserve closer scrutiny - even when there
is just one of them.
If you drop the severity field, is there some other way to make this
distinction?

> - silkscreen_label: <sarcasm> yeah, I'm getting a, say, a Data
> Cache error during an L1 linefill from L2, what the f*ck does the
> silkscreen label mean for such an error?! Well, nobody knows wtf it
> means!</sarcasm>

Cache error should point to a cpu socket - I'd like to have a silk
screen label for that (are they numbered "0, 1, 2 ..." on the motherboard
or "1, 2, 3 ..."?)  No idea where we'd get that information from. dmidecode
shows "Socket Designation: CPU 1" (and "2") for my current Sandy Bridge
system. I'd have to pull the system apart to see if those are helpful
in identifying which physical cpu is which.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ