lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 11 Oct 2013 10:04:27 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	"Chen, Gong" <gong.chen@...ux.intel.com>
Cc:	tony.luck@...el.com, linux-kernel@...r.kernel.org,
	linux-acpi@...r.kernel.org
Subject: Re: Extended H/W error log driver

On Fri, Oct 11, 2013 at 02:32:38AM -0400, Chen, Gong wrote:
> [56005.785917] {3}Hardware error detected on CPU0
> [56005.785959] {3}event severity: corrected
> [56005.785975] {3}sub_event[0], severity: corrected
> [56005.785977] {3}section_type: memory error
> [56005.785981] {3}physical_address: 0x0000000851fe0000
> [56005.786027] {3}DIMM location: Memriser1 CHANNEL A DIMM 0

Very good guys, I've been waiting for years for this to be possible,
good job! :-)

Btw, what's "Memriser1"?

> [56005.786154] {4}Hardware error detected on CPU0
> [56005.786159] {4}event severity: corrected
> [56005.786162] {4}sub_event[0], severity: corrected

This sub_event[0] could use better decoding though.

> [56005.786166] {4}section_type: memory error
> 
> 
> trace output:
> 
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 4/4   #P:120
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
> ...
> ...
>           <idle>-0     [000] d.h. 56068.488759: extlog_mem_event: 3 corrected errors:unknown

That "unknown" thing needs a " " in front of it and comes from
cper_mem_err_type_str, AFAICT. I'm guessing the value is 0 and
uninitialized or so?

> on Memriser1 CHANNEL A DIMM 0(FRU:

Also another " " missing here.

> 00000000-0000-0000-0000-000000000000  physical addr: 0x0000000851fe0000 node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 28927 column: 1296)
>           <idle>-0     [000] d.h. 56068.488834: extlog_mem_event: 4 corrected errors:unknown
> ...
> ...
> 
> dmesg output are shrank to only keep the most important data. The trace
> output will contain most of data. Not sure if all fields are meaningful
> to users. Some fields like FRU ID/FRU TEXT depends on BIOS manufactor.
> So welcome to add comments for what is needed or not.

Yeah, I guess we again depend on BIOS people to fill those in. I'd
expect serious server manifacturers who care about RAS to do so...

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ