linux-kernel - [PATCH RFCv2 16/16] edac: Add an error scope logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1327764771-28649-17-git-send-email-mchehab@redhat.com>
Date:	Sat, 28 Jan 2012 13:32:51 -0200
From:	Mauro Carvalho Chehab <mchehab@...hat.com>
To:	unlisted-recipients:; (no To-header on input)
Cc:	Mauro Carvalho Chehab <mchehab@...hat.com>,
	Linux Edac Mailing List <linux-edac@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: [PATCH RFCv2 16/16] edac: Add an error scope logic

This patch is currently incomplete, but the idea here is to
change the EDAC error calls to handle a scope var, that will
be used when providing the error traces to userspace, and
to increment a per-location counter.

Signed-off-by: Mauro Carvalho Chehab <mchehab@...hat.com>
---
 include/linux/edac.h |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/include/linux/edac.h b/include/linux/edac.h
index 5876675..879116e 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -72,6 +72,33 @@ enum hw_event_mc_err_type {
 	HW_EVENT_ERR_FATAL,
 };
 
+/**
+ * enum hw_event_error_scope - escope of a memory error
+ * @HW_EVENT_ERR_MC:		error can be anywhere inside the MC
+ * @HW_EVENT_SCOPE_MC_BRANCH:	error can be on any DIMM inside the branch
+ * @HW_EVENT_SCOPE_MC_CHANNEL:	error can be on any DIMM inside the MC channel
+ * @HW_EVENT_SCOPE_MC_CSROW:	error can be on any DIMM inside the csrow
+ * @HW_EVENT_SCOPE_MC_DIMM:	error is on a specific DIMM
+ *
+ * Depending on the error detection algorithm, the memory topology and even
+ * the MC capabilities, some errors can't be attributed to just one DIMM, but
+ * to a group of memory sockets. Depending on where the error occurs, the
+ * EDAC core will increment the corresponding error count for that entity,
+ * and the upper entities. For example, assuming a system with 1 memory
+ * controller 2 branches, 2 MC channels and 4 DIMMS on it, if an error
+ * happens at channel 0, the error counts for channel 0, for branch 0 and
+ * for the memory controller 0 will be incremented. The DIMM error counts won't
+ * be incremented, as, in this example, the driver can't be 100% sure on what
+ * memory the error actually occurred.
+ */
+enum hw_event_error_scope {
+	HW_EVENT_SCOPE_MC,
+	HW_EVENT_SCOPE_MC_BRANCH,
+	HW_EVENT_SCOPE_MC_CHANNEL,
+	HW_EVENT_SCOPE_MC_CSROW,
+	HW_EVENT_SCOPE_MC_CSROW_CHANNEL,
+};
+
 /* memory types */
 enum mem_type {
 	MEM_EMPTY = 0,		/* Empty csrow */
-- 
1.7.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/