linux-kernel - Re: [PATCH 4/4] EDAC: Convert AMD EDAC pieces to use RAS printk buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F50DECB.8030200@redhat.com>
Date:	Fri, 02 Mar 2012 11:52:59 -0300
From:	Mauro Carvalho Chehab <mchehab@...hat.com>
To:	Borislav Petkov <bp@...64.org>
CC:	Tony Luck <tony.luck@...el.com>, Ingo Molnar <mingo@...e.hu>,
	EDAC devel <linux-edac@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Borislav Petkov <borislav.petkov@....com>
Subject: Re: [PATCH 4/4] EDAC: Convert AMD EDAC pieces to use RAS printk buffer

Em 02-03-2012 11:25, Borislav Petkov escreveu:
> From: Borislav Petkov <borislav.petkov@....com>
> 
> This is an initial version of the patch which converts MCE decoding
> facilities to use the RAS printk buffer. When there's no userspace agent
> running (i.e., /sys/devices/system/ras/agent == 0), we fall back to the
> default printk's into dmesg which is what we've been doing so far.
> 
> Signed-off-by: Borislav Petkov <borislav.petkov@....com>
> ---
>  drivers/edac/amd64_edac.c |    8 ++-
>  drivers/edac/edac_mc.c    |   42 ++++++---
>  drivers/edac/mce_amd.c    |  217 ++++++++++++++++++++++++---------------------
>  3 files changed, 149 insertions(+), 118 deletions(-)
> 
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 08413377a43b..29e153c57e33 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -1,6 +1,7 @@
> -#include "amd64_edac.h"
>  #include <asm/amd_nb.h>
> +#include <asm/ras.h>
>  
> +#include "amd64_edac.h"
>  static struct edac_pci_ctl_info *amd64_ctl_pci;
>  
>  static int report_gart_errors;
> @@ -1901,7 +1902,10 @@ static void amd64_handle_ce(struct mem_ctl_info *mci, struct mce *m)
>  	sys_addr = get_error_address(m);
>  	syndrome = extract_syndrome(m->status);
>  
> -	amd64_mc_err(mci, "CE ERROR_ADDRESS= 0x%llx\n", sys_addr);
> +	if (ras_agent)
> +		ras_printk(PR_EMERG, "ERR_ADDR: 0x%llx", sys_addr);
> +	else
> +		amd64_mc_err(mci, "CE ERROR_ADDRESS= 0x%llx\n", sys_addr);
>  
>  	pvt->ops->map_sysaddr_to_csrow(mci, sys_addr, syndrome);
>  }
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index ca6c04d350ee..3b3db477b5d0 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -30,8 +30,10 @@
>  #include <asm/uaccess.h>
>  #include <asm/page.h>
>  #include <asm/edac.h>
> +#include <asm/ras.h>
>  #include "edac_core.h"
>  #include "edac_module.h"
> +#include "mce_amd.h"
>  
>  /* lock to memory controller's control array */
>  static DEFINE_MUTEX(mem_ctls_mutex);
> @@ -701,14 +703,22 @@ void edac_mc_handle_ce(struct mem_ctl_info *mci,
>  		return;
>  	}
>  
> -	if (edac_mc_get_log_ce())
> -		/* FIXME - put in DIMM location */
> -		edac_mc_printk(mci, KERN_WARNING,
> -			"CE page 0x%lx, offset 0x%lx, grain %d, syndrome "
> -			"0x%lx, row %d, channel %d, label \"%s\": %s\n",
> -			page_frame_number, offset_in_page,
> -			mci->csrows[row].grain, syndrome, row, channel,
> -			mci->csrows[row].channels[channel].label, msg);
> +	if (edac_mc_get_log_ce()) {
> +		if (ras_agent)
> +			ras_printk(PR_CONT, ", row: %d, channel: %d\n",
> +				   row, channel);
> +		else
> +			/* FIXME - put in DIMM location */
> +			edac_mc_printk(mci, KERN_WARNING,
> +					"CE page 0x%lx, offset 0x%lx, grain %d,"
> +					" syndrome 0x%lx, row %d, channel %d,"
> +					" label \"%s\": %s\n",
> +					page_frame_number, offset_in_page,
> +					mci->csrows[row].grain, syndrome,
> +					row, channel,
> +					mci->csrows[row].channels[channel].label,
> +					msg);
> +	}


As I've commented already, This piece of the code is not ok, due
to several reasons:

	- the "ras_agent" helper functions is used only for amd64_edac. There's
no reason for use it elsewhere;

	- the code here is adding the location only for CE errors. The location
of the error is also pertinent for the other types of errors;

	- on my patches, this function disappears, being replaced by a single
function, that can be used to report all types of memory errors;

	- non MCA drivers should also generate tracepoints;

	- it is a way easier to add just one call to the tracepoint function here, than
to spread on all drivers.

As it is shown at:
	http://git.kernel.org/?p=linux/kernel/git/mchehab/linux-edac.git;a=commitdiff_plain;h=4eb2a29419c1fefd76c8dbcd308b84a4b52faf4d

Once we made an agreement with regards to the tracepoint function, 
a change similar to what it was proposed there, e. g.:

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 37d2c97..2dca0e3 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -899,7 +899,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			  const int layer2,
 			  const char *msg,
 			  const char *other_detail,
-			  const void *mcelog)
+			  const void *arch_log)
 {
 	unsigned long remapped_page;
 	/* FIXME: too much for stack: move it to some pre-alocated area */
@@ -1033,8 +1047,17 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
 			"page 0x%lx offset 0x%lx grain %d\n",
 			page_frame_number, offset_in_page, grain);
 
+#ifdef CONFIG_X86
+	if (arch_log)
+		trace_mc_error_mce(type, mci->mc_idx, msg, label, location,
+				   detail, other_detail, arch_log);
+	else
+		trace_mc_error(type, mci->mc_idx, msg, label, location,
+			       detail, other_detail);
+#else
 	trace_mc_error(type, mci->mc_idx, msg, label, location,
 		       detail, other_detail);
+#endif
 
 	if (type == HW_EVENT_ERR_CORRECTED) {
 		if (edac_mc_get_log_ce())

is enough to generate traces for all existing drivers.

Regards,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/