[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240729140604.2814597-1-leitao@debian.org>
Date: Mon, 29 Jul 2024 07:06:01 -0700
From: Breno Leitao <leitao@...ian.org>
To: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>
Cc: leit@...a.com,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Wei Liu <wei.liu@...nel.org>,
Marc Zyngier <maz@...nel.org>,
Adrian Huang <ahuang12@...ovo.com>,
linux-kernel@...r.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Subject: [PATCH] x86/apic: Add retry mechanism to add_pin_to_irq_node()
I've been running some experiments with failslab fault injector running
to detect a different problem, and the machine always crash with the
following stack:
can not alloc irq_pin_list (-1,0,20)
Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed
Call Trace:
panic
_printk
panic_smp_self_stop
rcu_is_watching
intel_irq_remapping_free
This happens because add_pin_to_irq_node() function would panic if
adding a pin to an IRQ failed due to -ENOMEM (which was injected by
failslab fault injector). I've been running with this patch in my test
cases in order to be able to pick real bugs, and I thought it might be a
good idea to have it upstream also, so, other people trying to find real
bugs don't stumble upon this one. Also, this makes sense in a real
world(?), when retrying a few times might be better than just panicking.
Introduce a retry mechanism that attempts to add the pin up to 3 times
before giving up and panicking. This should improve the robustness of
the IO-APIC code in the face of transient errors.
Since __add_pin_to_irq_node() only returns 0 or -ENOMEM, the retry is only
for -ENOMEM case only.
Signed-off-by: Breno Leitao <leitao@...ian.org>
---
arch/x86/kernel/apic/io_apic.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 477b740b2f26..2846a90366f2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -390,8 +390,14 @@ static void __remove_pin_from_irq(struct mp_chip_data *data, int apic, int pin)
static void add_pin_to_irq_node(struct mp_chip_data *data,
int node, int apic, int pin)
{
- if (__add_pin_to_irq_node(data, node, apic, pin))
- panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
+ int ret, i;
+
+ for (i = 0; i < 3; i++) {
+ ret = __add_pin_to_irq_node(data, node, apic, pin);
+ if (!ret)
+ return;
+ }
+ panic("IO-APIC: failed to add irq-pin. Can not proceed\n");
}
/*
--
2.43.0
Powered by blists - more mailing lists