lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250207143028.1865-4-shiju.jose@huawei.com>
Date: Fri, 7 Feb 2025 14:30:24 +0000
From: <shiju.jose@...wei.com>
To: <linux-edac@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
	<mchehab@...nel.org>, <dave.jiang@...el.com>, <dan.j.williams@...el.com>,
	<bp@...en8.de>, <jonathan.cameron@...wei.com>, <alison.schofield@...el.com>,
	<vishal.l.verma@...el.com>, <ira.weiny@...el.com>, <dave@...olabs.net>
CC: <linux-kernel@...r.kernel.org>, <linuxarm@...wei.com>,
	<tanxiaofei@...wei.com>, <prime.zeng@...ilicon.com>, <shiju.jose@...wei.com>
Subject: [PATCH 3/4] rasdaemon: cxl: Add storing memory repair needed info in the DRAM event record

From: Shiju Jose <shiju.jose@...wei.com>

Rasdaemon supports live memory repair for the CXL DRAM errors reported,
with 'maintenance needed' flag set. However the kernel CXL driver rejects
the request for the live memory repair in the following situations.
1. Memory is online and the repair is disruptive.
2. Memory is online and event record does not match.
In addition, live memory repair is not requested if the auto repair option
is switched off for the rasdaemon.

In the above unrepaired cases, repair-needed information for CXL DRAM
events must be stored in the CXL DRAM event record of the SQLite database.
This allows a boot-up script to read repair status and repair attributes
in the next boot. If the memory has not been repaired, the script will
issue the memory repair operation requested by the memory device in the
previous boot. The kernel CXL driver sends a repair command to the device
if the memory to be repaired is offline.

This change adds a field for storing the memory repair needed info in the
CXL DRAM event record of the SQLite database and ensures that the repair
needed status is stored.

Signed-off-by: Shiju Jose <shiju.jose@...wei.com>
---
 ras-cxl-handler.c | 9 ++++++---
 ras-record.c      | 2 ++
 ras-record.h      | 1 +
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/ras-cxl-handler.c b/ras-cxl-handler.c
index ae49740..25ef5c9 100644
--- a/ras-cxl-handler.c
+++ b/ras-cxl-handler.c
@@ -1609,10 +1609,13 @@ int ras_cxl_dram_event_handler(struct trace_seq *s,
 	if (trace_seq_printf(s, "CVME Count:%u ", ev.cvme_count) <= 0)
 		return -1;
 
-	cxl_dram_sparing(&ev);
+	if (cxl_dram_sparing(&ev) < 0)
+		ev.repair_needed = true;
 
-	if (!(ev.dpa_flags & CXL_DPA_NOT_REPAIRABLE))
-		cxl_ppr(&ev.hdr, ev.dpa, ev.nibble_mask);
+	if (!(ev.dpa_flags & CXL_DPA_NOT_REPAIRABLE)) {
+		if (cxl_ppr(&ev.hdr, ev.dpa, ev.nibble_mask) < 0)
+			ev.repair_needed = true;
+	}
 
 	/* Insert data into the SGBD */
 #ifdef HAVE_SQLITE3
diff --git a/ras-record.c b/ras-record.c
index eed7aca..ed745f9 100644
--- a/ras-record.c
+++ b/ras-record.c
@@ -993,6 +993,7 @@ static const struct db_fields cxl_dram_event_fields[] = {
 	{ .name = "sub_channel",	.type = "INTEGER" },
 	{ .name = "cme_threshold_ev_flags",	.type = "INTEGER" },
 	{ .name = "cvme_count",		.type = "INTEGER" },
+	{ .name = "repair_needed",	.type = "INTEGER" },
 };
 
 static const struct db_table_descriptor cxl_dram_event_tab = {
@@ -1043,6 +1044,7 @@ int ras_store_cxl_dram_event(struct ras_events *ras, struct ras_cxl_dram_event *
 	sqlite3_bind_int(priv->stmt_cxl_dram_event, idx++,
 			 ev->cme_threshold_ev_flags);
 	sqlite3_bind_int(priv->stmt_cxl_dram_event, idx++, ev->cvme_count);
+	sqlite3_bind_int(priv->stmt_cxl_dram_event, idx++, ev->repair_needed);
 
 	rc = sqlite3_step(priv->stmt_cxl_dram_event);
 	if (rc != SQLITE_OK && rc != SQLITE_DONE)
diff --git a/ras-record.h b/ras-record.h
index 5eab62c..35edae1 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -238,6 +238,7 @@ struct ras_cxl_dram_event {
 	uint8_t res_id[CXL_PLDM_RES_ID_LEN];
 	uint8_t cme_threshold_ev_flags;
 	uint32_t cvme_count;
+	bool repair_needed;
 };
 
 struct ras_cxl_memory_module_event {
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ