lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240729140604.2814597-1-leitao@debian.org>
Date: Mon, 29 Jul 2024 07:06:01 -0700
From: Breno Leitao <leitao@...ian.org>
To: Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	x86@...nel.org,
	"H. Peter Anvin" <hpa@...or.com>
Cc: leit@...a.com,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>,
	Wei Liu <wei.liu@...nel.org>,
	Marc Zyngier <maz@...nel.org>,
	Adrian Huang <ahuang12@...ovo.com>,
	linux-kernel@...r.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Subject: [PATCH] x86/apic: Add retry mechanism to add_pin_to_irq_node()

I've been running some experiments with failslab fault injector running
to detect a different problem, and the machine always crash with the
following stack:

	can not alloc irq_pin_list (-1,0,20)
	Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed

	Call Trace:
	 panic
	   _printk
	   panic_smp_self_stop
	   rcu_is_watching
	   intel_irq_remapping_free

This happens because add_pin_to_irq_node() function would panic if
adding a pin to an IRQ failed due to -ENOMEM (which was injected by
failslab fault injector).  I've been running with this patch in my test
cases in order to be able to pick real bugs, and I thought it might be a
good idea to have it upstream also, so, other people trying to find real
bugs don't stumble upon this one. Also, this makes sense in a real
world(?), when retrying a few times might be better than just panicking.

Introduce a retry mechanism that attempts to add the pin up to 3 times
before giving up and panicking. This should improve the robustness of
the IO-APIC code in the face of transient errors.

Since __add_pin_to_irq_node() only returns 0 or -ENOMEM, the retry is only
for -ENOMEM case only.

Signed-off-by: Breno Leitao <leitao@...ian.org>
---
 arch/x86/kernel/apic/io_apic.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 477b740b2f26..2846a90366f2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -390,8 +390,14 @@ static void __remove_pin_from_irq(struct mp_chip_data *data, int apic, int pin)
 static void add_pin_to_irq_node(struct mp_chip_data *data,
 				int node, int apic, int pin)
 {
-	if (__add_pin_to_irq_node(data, node, apic, pin))
-		panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
+	int ret, i;
+
+	for (i = 0; i < 3; i++) {
+		ret = __add_pin_to_irq_node(data, node, apic, pin);
+		if (!ret)
+			return;
+	}
+	panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
 }
 
 /*
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ