lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTSe-6MSpB4hwwvwPgDqHkxYJoxMZMDbOusNqiq0Gwa1eiQ@mail.gmail.com>
Date:   Tue, 2 Feb 2021 18:53:08 -0500
From:   Willem de Bruijn <willemb@...gle.com>
To:     Wei Wang <weiwan@...gle.com>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        David Miller <davem@...emloft.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        virtualization@...ts.linux-foundation.org
Subject: Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@...gle.com> wrote:
>
> On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> >
> > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > With the implementation of napi-tx in virtio driver, we clean tx
> > > descriptors from rx napi handler, for the purpose of reducing tx
> > > complete interrupts. But this could introduce a race where tx complete
> > > interrupt has been raised, but the handler found there is no work to do
> > > because we have done the work in the previous rx interrupt handler.
> > > This could lead to the following warning msg:
> > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > "irqpoll" option)
> > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > [ 3588.017940] Call Trace:
> > > [ 3588.017942]  <IRQ>
> > > [ 3588.017951]  dump_stack+0x63/0x85
> > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > [ 3588.017961]  handle_irq+0x20/0x30
> > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > [ 3588.017966]  </IRQ>
> > > [ 3588.017989] handlers:
> > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > [ 3588.025099] Disabling IRQ #38
> > >
> > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > case.
> > >
> > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > Reported-by: Rick Jones <jonesrick@...gle.com>
> > > Signed-off-by: Wei Wang <weiwan@...gle.com>
> > > Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> >
> >
> > This description does not make sense to me.
> >
> > irq X: nobody cared
> > only triggers after an interrupt is unhandled repeatedly.
> >
> > So something causes a storm of useless tx interrupts here.
> >
> > Let's find out what it was please. What you are doing is
> > just preventing linux from complaining.
>
> The traffic that causes this warning is a netperf tcp_stream with at
> least 128 flows between 2 hosts. And the warning gets triggered on the
> receiving host, which has a lot of rx interrupts firing on all queues,
> and a few tx interrupts.
> And I think the scenario is: when the tx interrupt gets fired, it gets
> coalesced with the rx interrupt. Basically, the rx and tx interrupts
> get triggered very close to each other, and gets handled in one round
> of do_IRQ(). And the rx irq handler gets called first, which calls
> virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> to try to do the work on the corresponding tx queue as well. That's
> why when tx interrupt handler gets called, it sees no work to do.
> And the reason for the rx handler to handle the tx work is here:
> https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html

Indeed. It's not a storm necessarily. The warning occurs after one
hundred such events, since boot, which is a small number compared real
interrupt load.

Occasionally seeing an interrupt with no work is expected after
7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
long as this rate of events is very low compared to useful interrupts,
and total interrupt count is greatly reduced vs not having work
stealing, it is a net win.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ