linux-kernel - [PATCH] RAS/CEC: Reduce offline page threshold for Intel systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220701191239.619940-1-tony.luck@intel.com>
Date:   Fri,  1 Jul 2022 12:12:39 -0700
From:   Tony Luck <tony.luck@...el.com>
To:     yazen.ghannam@....com
Cc:     tony.luck@...el.com, bp@...en8.de, linux-kernel@...r.kernel.org,
        patches@...ts.linux.dev, x86@...nel.org
Subject: [PATCH] RAS/CEC: Reduce offline page threshold for Intel systems

A large scale study of memory errors on Intel systems in data centers
showed that aggressively taking pages with corrected errors offline is
the best strategy of using corrected errors as a predictor of future
uncorrected errors.

It is unknown whether this would help other vendors. There are some
indicators that it would not.

Set the threshold to "2" on Intel systems.

Do-not-apply-without-agreement-from-AMD
Signed-off-by: Tony Luck <tony.luck@...el.com>
---
 drivers/ras/cec.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 42f2fc0bc8a9..b1fc193b2036 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -556,6 +556,14 @@ static int __init cec_init(void)
 	if (ce_arr.disabled)
 		return -ENODEV;
 
+	/*
+	 * Intel systems may avoid uncorreectable errors
+	 * if pages with corrected errors are aggresively
+	 * taken offline.
+	 */
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		action_threshold = 2;
+
 	ce_arr.array = (void *)get_zeroed_page(GFP_KERNEL);
 	if (!ce_arr.array) {
 		pr_err("Error allocating CE array page!\n");
-- 
2.35.3