[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20160127180939.453739059@linuxfoundation.org>
Date: Wed, 27 Jan 2016 10:13:13 -0800
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: linux-kernel@...r.kernel.org
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
stable@...r.kernel.org, Alistair Popple <alistair@...ple.id.au>,
Andrew Donnellan <andrew.donnellan@....ibm.com>,
Michael Ellerman <mpe@...erman.id.au>
Subject: [PATCH 4.3 126/157] powerpc/opal-irqchip: Fix deadlock introduced by "Fix double endian conversion"
4.3-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alistair Popple <alistair@...ple.id.au>
commit 036592fbbe753d236402a0ae68148e7c143a0f0e upstream.
Commit 25642e1459ac ("powerpc/opal-irqchip: Fix double endian
conversion") fixed an endian bug by calling opal_handle_events() in
opal_event_unmask().
However this introduced a deadlock if we find an event is active
during unmasking and call opal_handle_events() again. The bad call
sequence is:
opal_interrupt()
-> opal_handle_events()
-> generic_handle_irq()
-> handle_level_irq()
-> raw_spin_lock(&desc->lock)
handle_irq_event(desc)
unmask_irq(desc)
-> opal_event_unmask()
-> opal_handle_events()
-> generic_handle_irq()
-> handle_level_irq()
-> raw_spin_lock(&desc->lock) (BOOM)
When generating multiple opal events in quick succession this would lead
to the following stall warnings:
EEH: Fenced PHB#0 detected, location: U78C9.001.WZS09XA-P1-C32
INFO: rcu_sched detected stalls on CPUs/tasks:
12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=2065
15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=2065
(detected by 13, t=2102 jiffies, g=1325, c=1324, q=602)
NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [irqbalance:2696]
INFO: rcu_sched detected stalls on CPUs/tasks:
12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=8371
15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=8371
(detected by 20, t=8407 jiffies, g=1325, c=1324, q=1290)
This patch corrects the problem by queuing the work if an event is
active during unmasking, which is similar to the pre-endian fix
behaviour.
Fixes: 25642e1459ac ("powerpc/opal-irqchip: Fix double endian conversion")
Signed-off-by: Alistair Popple <alistair@...ple.id.au>
Reported-by: Andrew Donnellan <andrew.donnellan@....ibm.com>
Signed-off-by: Michael Ellerman <mpe@...erman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
---
arch/powerpc/platforms/powernv/opal-irqchip.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
--- a/arch/powerpc/platforms/powernv/opal-irqchip.c
+++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
@@ -83,7 +83,19 @@ static void opal_event_unmask(struct irq
set_bit(d->hwirq, &opal_event_irqchip.mask);
opal_poll_events(&events);
- opal_handle_events(be64_to_cpu(events));
+ last_outstanding_events = be64_to_cpu(events);
+
+ /*
+ * We can't just handle the events now with opal_handle_events().
+ * If we did we would deadlock when opal_event_unmask() is called from
+ * handle_level_irq() with the irq descriptor lock held, because
+ * calling opal_handle_events() would call generic_handle_irq() and
+ * then handle_level_irq() which would try to take the descriptor lock
+ * again. Instead queue the events for later.
+ */
+ if (last_outstanding_events & opal_event_irqchip.mask)
+ /* Need to retrigger the interrupt */
+ irq_work_queue(&opal_event_irq_work);
}
static int opal_event_set_type(struct irq_data *d, unsigned int flow_type)
Powered by blists - more mailing lists