[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210914192515.9273-3-linus.luessing@c0d3.blue>
Date: Tue, 14 Sep 2021 21:25:14 +0200
From: Linus Lüssing <linus.luessing@...3.blue>
To: Kalle Valo <kvalo@...eaurora.org>, Felix Fietkau <nbd@....name>,
Sujith Manoharan <c_manoha@....qualcomm.com>,
ath9k-devel@....qualcomm.com
Cc: linux-wireless@...r.kernel.org,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
"John W . Linville" <linville@...driver.com>,
Felix Fietkau <nbd@...nwrt.org>,
Simon Wunderlich <sw@...onwunderlich.de>,
Sven Eckelmann <sven@...fation.org>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org,
Linus Lüssing <ll@...onwunderlich.de>,
Linus Lüssing <linus.luessing@...3.blue>
Subject: [PATCH 2/3] ath9k: Fix potential interrupt storm on queue reset
From: Linus Lüssing <ll@...onwunderlich.de>
In tests with two Lima boards from 8devices (QCA4531 based) on OpenWrt
19.07 we could force a silent restart of a device with no serial
output when we were sending a high amount of UDP traffic (iperf3 at 80
MBit/s in both directions from external hosts, saturating the wifi and
causing a load of about 4.5 to 6) and were then triggering an
ath9k_queue_reset().
Further debugging showed that the restart was caused by the ath79
watchdog. With disabled watchdog we could observe that the device was
constantly going into ath_isr() interrupt handler and was returning
early after the ATH_OP_HW_RESET flag test, without clearing any
interrupts. Even though ath9k_queue_reset() calls
ath9k_hw_kill_interrupts().
With JTAG we could observe the following race condition:
1) ath9k_queue_reset()
...
-> ath9k_hw_kill_interrupts()
-> set_bit(ATH_OP_HW_RESET, &common->op_flags);
...
<- returns
2) ath9k_tasklet()
...
-> ath9k_hw_resume_interrupts()
...
<- returns
3) loops around:
...
handle_int()
-> ath_isr()
...
-> if (test_bit(ATH_OP_HW_RESET,
&common->op_flags))
return IRQ_HANDLED;
x) ath_reset_internal():
=> never reached <=
And in ath_isr() we would typically see the following interrupts /
interrupt causes:
* status: 0x00111030 or 0x00110030
* async_cause: 2 (AR_INTR_MAC_IPQ)
* sync_cause: 0
So the ath9k_tasklet() reenables the ath9k interrupts
through ath9k_hw_resume_interrupts() which ath9k_queue_reset() had just
disabled. And ath_isr() then keeps firing because it returns IRQ_HANDLED
without actually clearing the interrupt.
To fix this IRQ storm also clear/disable the interrupts again when we
are in reset state.
Cc: Sven Eckelmann <sven@...fation.org>
Cc: Simon Wunderlich <sw@...onwunderlich.de>
Cc: Linus Lüssing <linus.luessing@...3.blue>
Fixes: 872b5d814f99 ("ath9k: do not access hardware on IRQs during reset")
Signed-off-by: Linus Lüssing <ll@...onwunderlich.de>
---
drivers/net/wireless/ath/ath9k/main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 139831539da3..98090e40e1cf 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -533,8 +533,10 @@ irqreturn_t ath_isr(int irq, void *dev)
ath9k_debug_sync_cause(sc, sync_cause);
status &= ah->imask; /* discard unasked-for bits */
- if (test_bit(ATH_OP_HW_RESET, &common->op_flags))
+ if (test_bit(ATH_OP_HW_RESET, &common->op_flags)) {
+ ath9k_hw_kill_interrupts(sc->sc_ah);
return IRQ_HANDLED;
+ }
/*
* If there are no status bits set, then this interrupt was not
--
2.31.0
Powered by blists - more mailing lists