[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210526082423.47837-1-mst@redhat.com>
Date: Wed, 26 May 2021 04:24:31 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: Jakub Kicinski <kuba@...nel.org>, Wei Wang <weiwan@...gle.com>,
David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
Willem de Bruijn <willemb@...gle.com>,
virtualization@...ts.linux-foundation.org
Subject: [PATCH v3 0/4] virtio net: spurious interrupt related fixes
With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this introduces a race where tx complete
interrupt has been raised, but the handler finds there is no work to do
because we have done the work in the previous rx interrupt handler.
A similar issue exists with polling from start_xmit, it is however
less common because of the delayed cb optimization of the split ring -
but will likely affect the packed ring once that is more common.
In particular, this was reported to lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942] <IRQ>
[ 3588.017951] dump_stack+0x63/0x85
[ 3588.017953] __report_bad_irq+0x35/0xc0
[ 3588.017955] note_interrupt+0x24b/0x2a0
[ 3588.017956] handle_irq_event_percpu+0x54/0x80
[ 3588.017957] handle_irq_event+0x3b/0x60
[ 3588.017958] handle_edge_irq+0x83/0x1a0
[ 3588.017961] handle_irq+0x20/0x30
[ 3588.017964] do_IRQ+0x50/0xe0
[ 3588.017966] common_interrupt+0xf/0xf
[ 3588.017966] </IRQ>
[ 3588.017989] handlers:
[ 3588.020374] [<000000001b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38
This patchset attempts to fix this by cleaning up a bunch of races
related to the handling of sq callbacks (aka tx interrupts).
Somewhat tested but I couldn't reproduce the original issues
reported, sending out for help with testing.
Wei, does this address the spurious interrupt issue you are
observing? Could you confirm please?
Thanks!
changes from v2:
Fixed a race condition in start_xmit: enable_cb_delayed was
done as an optimization (to push out event index for
split ring) so we did not have to care about it
returning false (recheck). Now that we actually disable the cb
we have to do test the return value and do the actual recheck.
Michael S. Tsirkin (4):
virtio_net: move tx vq operation under tx queue lock
virtio_net: move txq wakeups under tx q lock
virtio: fix up virtio_disable_cb
virtio_net: disable cb aggressively
drivers/net/virtio_net.c | 49 ++++++++++++++++++++++++++++--------
drivers/virtio/virtio_ring.c | 26 ++++++++++++++++++-
2 files changed, 64 insertions(+), 11 deletions(-)
--
MST
Powered by blists - more mailing lists