lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 6 Jun 2012 14:53:20 +0200 From: Borislav Petkov <bp@...64.org> To: Tony Luck <tony.luck@...el.com> Cc: Mauro Carvalho Chehab <mchehab@...hat.com>, Borislav Petkov <bp@...64.org>, Linux Edac Mailing List <linux-edac@...r.kernel.org>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Aristeu Rozanski <arozansk@...hat.com>, Doug Thompson <norsk5@...oo.com>, Steven Rostedt <rostedt@...dmis.org>, Frederic Weisbecker <fweisbec@...il.com>, Ingo Molnar <mingo@...hat.com> Subject: Re: [PATCH v29] RAS: Add a tracepoint for reporting memory controller events On Wed, Jun 06, 2012 at 07:33:19AM -0300, Mauro Carvalho Chehab wrote: > RAS: Add a tracepoint for reporting memory controller events > > From: Mauro Carvalho Chehab <mchehab@...hat.com> [ … ] > The tracepoint printk will be displayed like: > > mc_event: [quant] (Corrected|Uncorrected|Fatal) error:[error msg] on memory stick [label] ([location] [edac_mc detail] [driver_d$ > > Where: > [quant] is the quantity of errors > [error msg] is the driver-specific error message > (e. g. "memory read", "bus error", ...); > [location] is the location in terms of memory controller and > branch/channel/slot, channel/slot or csrow/channel; > [label] is the memory stick label; > [edac_mc detail] describes the address location of the error > and the syndrome; > [driver detail] is driver-specifig error message details, > when needed/provided (e. g. "area:DMA", ...) > > For example: > > mc_event: 1 Corrected error:memory read on memory stick DIMM_1A (mc:0 location:0:0:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA) > > Of course, any userspace tools meant to handle errors should not parse > the above data. They should, instead, use the binary fields provided by > the tracepoint, mapping them directly into their Management Information > Base. > > NOTE: The original patch was providing an additional mechanism for > MCA-based trace events that also contained MCA error register data. > However, as no agreement was reached so far for the MCA-based trace > events, for now, let's add events only for memory errors. > A latter patch is planned to change the tracepoint, for those types > of event. > > Cc: Aristeu Rozanski <arozansk@...hat.com> > Cc: Doug Thompson <norsk5@...oo.com> > Cc: Steven Rostedt <rostedt@...dmis.org> > Cc: Frederic Weisbecker <fweisbec@...il.com> > Cc: Ingo Molnar <mingo@...hat.com> > Signed-off-by: Mauro Carvalho Chehab <mchehab@...hat.com> Ok, this is starting to shape up, here's the output on my box here: mcegen.py-3009 [008] .N.. 144.149649: mc_event: 1 Corrected error: amd64_edac on unknown memory (mc:0 location:3:1:-1 address:0x000007ba grain:2 syndrome:0x0000ac71) Tony, any objections? -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists