lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 6 Jun 2012 14:53:20 +0200
From:	Borislav Petkov <bp@...64.org>
To:	Tony Luck <tony.luck@...el.com>
Cc:	Mauro Carvalho Chehab <mchehab@...hat.com>,
	Borislav Petkov <bp@...64.org>,
	Linux Edac Mailing List <linux-edac@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Aristeu Rozanski <arozansk@...hat.com>,
	Doug Thompson <norsk5@...oo.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH v29] RAS: Add a tracepoint for reporting memory
 controller events

On Wed, Jun 06, 2012 at 07:33:19AM -0300, Mauro Carvalho Chehab wrote:
> RAS: Add a tracepoint for reporting memory controller events
> 
> From: Mauro Carvalho Chehab <mchehab@...hat.com>

[ … ]

> The tracepoint printk will be displayed like:
> 
> mc_event: [quant] (Corrected|Uncorrected|Fatal) error:[error msg] on memory stick [label] ([location] [edac_mc detail] [driver_d$
> 
> Where:
>        	[quant] is the quantity of errors
> 	[error msg] is the driver-specific error message
> 		    (e. g. "memory read", "bus error", ...);
> 	[location] is the location in terms of memory controller and
> 		   branch/channel/slot, channel/slot or csrow/channel;
> 	[label] is the memory stick label;
> 	[edac_mc detail] describes the address location of the error
> 			 and the syndrome;
> 	[driver detail] is driver-specifig error message details,
> 			when needed/provided (e. g. "area:DMA", ...)
> 
> For example:
> 
> mc_event: 1 Corrected error:memory read on memory stick DIMM_1A (mc:0 location:0:0:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)
> 
> Of course, any userspace tools meant to handle errors should not parse
> the above data. They should, instead, use the binary fields provided by
> the tracepoint, mapping them directly into their Management Information
> Base.
> 
> NOTE: The original patch was providing an additional mechanism for
> MCA-based trace events that also contained MCA error register data.
> However, as no agreement was reached so far for the MCA-based trace
> events, for now, let's add events only for memory errors.
> A latter patch is planned to change the tracepoint, for those types
> of event.
> 
> Cc: Aristeu Rozanski <arozansk@...hat.com>
> Cc: Doug Thompson <norsk5@...oo.com>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Frederic Weisbecker <fweisbec@...il.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@...hat.com>

Ok, this is starting to shape up, here's the output on my box here:

       mcegen.py-3009  [008] .N..   144.149649: mc_event: 1 Corrected error: amd64_edac on unknown memory (mc:0 location:3:1:-1 address:0x000007ba grain:2 syndrome:0x0000ac71)

Tony, any objections?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists