lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 20 Jul 2009 10:24:13 -0700 (PDT)
From:	Doug Thompson <norsk5@...oo.com>
To:	mingo@...e.hu, hpa@...or.com, tglx@...utronix.de, aris@...hat.com,
	Borislav Petkov <borislav.petkov@....com>
Cc:	linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [RFC PATCH 0/14] amd64_edac: marry mcheck to amd64 edac

--- On Mon, 7/20/09, Borislav Petkov <borislav.petkov@....com> wrote:

> From: Borislav Petkov <borislav.petkov@....com>
> Subject: [RFC PATCH 0/14] amd64_edac: marry mcheck to amd64 edac
> To: mingo@...e.hu, hpa@...or.com, tglx@...utronix.de, norsk5@...oo.com, aris@...hat.com
> Cc: linux-kernel@...r.kernel.org, x86@...nel.org
> Date: Monday, July 20, 2009, 10:12 AM
> Hi all,
> 
> this is the first version of the attempt to forward MCE
> information to
> the amd64 EDAC module for further decoding. When the MCE
> handler gets
> invoked and the EDAC module is loaded, here's how a decoded
> MCE looks
> like:

This looks good. I will apply and test shortly.

Question: are you planning to have the ErrAddr decoding added later, where we decode to an actual DIMM label, as stored in the MCI structure for that error address?

If so, okay. If not, then we must have that to be displayed so the maintenance techs know exactly which DIMM to pull. Only the amd64 edac module has that and the controller registers to properly decode it.

the MCE has a poller thread as well for CORRECTED errors. Its cycle is abt 5 minutes I believe, while EDAC is 1 second. That is another item we need to sort out

thanks

doug t

> 
> Disabling lock debugging due to kernel taint
> 
> <0>HARDWARE ERROR
> CPU 3: Machine Check Exception:       
>         4 Bank 0: b20040001c000175
> TSC 714e9b73cf 
> PROCESSOR 2:100f22 TIME 1247237579 SOCKET 0 APIC 3
> MC0_STATUS: Uncorrected error, report: yes, MiscV: invalid,
> CPU context corrupt: yes
>  Data Cache Error: Data/Tag Evict error.
>  Transaction: Evict, Type: Data, Cache Level: L1
> This is not a software problem!
> <0>Run through mcelog --ascii to decode and contact
> your hardware vendor
> Machine check: Processor context corrupt
> Kernel panic - not syncing: Fatal machine check on current
> CPU
> Pid: 4817, comm: cc1 Tainted: G   M 
>      2.6.31-rc2-00218-g78848b0-dirty
> #42
> Call Trace:
>  <#MC>  [<ffffffff8134a17a>]
> panic+0xaf/0x178
>  [<ffffffff812b5d9e>] ? decode_mce+0x47e/0x540
>  [<ffffffff81019210>] ? print_mce+0x90/0x110
>  [<ffffffff810193e7>] mce_panic+0x157/0x180
>  [<ffffffff81019de7>] do_machine_check+0x757/0x930
>  [<ffffffff8134d96d>] ?
> trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff8134e9cb>] machine_check+0x1b/0x20
>  <EOE>
> 
> Clearly, the "Run through mcelog... " line is redundant now
> :) since
> there's no need for userspace decoding anymore and the
> original EDAC
> functionality (polling workqueue) is still preserved. The
> code currently
> uses EDAC to decode DRAM ECC errors but this could clearly
> be extended
> to handle all valid addresses acquired from MCi_ADDR
> registers.
> 
> Comments and further suggestions are most welcome.
> 
> Thanks,
> Boris.
> 
>  arch/x86/kernel/cpu/mcheck/mce.c    | 
>   7 +
>  drivers/edac/amd64_edac.c       
>    |  484
> +++++++++++++++++++++--------------
>  drivers/edac/amd64_edac.h       
>    |   67 ++---
>  drivers/edac/amd64_edac_dbg.c   
>    |    2 +-
>  drivers/edac/amd64_edac_err_types.c |  126
> +++++-----
>  5 files changed, 382 insertions(+), 304 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ