lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACGkMEvPdiKS7+S5Btk+uMwtwRnPfTd6Brwz2acgBfNAnTXMFA@mail.gmail.com>
Date: Mon, 5 Aug 2024 11:26:56 +0800
From: Jason Wang <jasowang@...hat.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Heng Qi <hengqi@...ux.alibaba.com>, netdev@...r.kernel.org, 
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Eugenio Pérez <eperezma@...hat.com>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net-next] virtio_net: Prevent misidentified spurious
 interrupts from killing the irq

On Fri, Aug 2, 2024 at 9:11 PM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Fri, Aug 02, 2024 at 11:41:57AM +0800, Jason Wang wrote:
> > On Thu, Aug 1, 2024 at 9:56 PM Heng Qi <hengqi@...ux.alibaba.com> wrote:
> > >
> > > Michael has effectively reduced the number of spurious interrupts in
> > > commit a7766ef18b33 ("virtio_net: disable cb aggressively") by disabling
> > > irq callbacks before cleaning old buffers.
> > >
> > > But it is still possible that the irq is killed by mistake:
> > >
> > >   When a delayed tx interrupt arrives, old buffers has been cleaned in
> > >   other paths (start_xmit and virtnet_poll_cleantx), then the interrupt is
> > >   mistakenly identified as a spurious interrupt in vring_interrupt.
> > >
> > >   We should refrain from labeling it as a spurious interrupt; otherwise,
> > >   note_interrupt may inadvertently kill the legitimate irq.
> >
> > I think the evil came from where we do free_old_xmit() in
> > start_xmit(). I know it is for performance, but we may need to make
> > the code work correctly instead of adding endless hacks. Personally, I
> > think the virtio-net TX path is over-complicated. We probably pay too
> > much (e.g there's netif_tx_lock in TX NAPI path) to try to "optimize"
> > the performance.
> >
> > How about just don't do free_old_xmit and do that solely in the TX NAPI?
>
> Not getting interrupts is always better than getting interrupts.

Not sure. For example letting 1 cpu to do the transmission without the
dealing of xmit skbs should give us better performance.

> This is not new code, there are no plans to erase it all and start
> anew "to make it work correctly" - it's widely deployed,
> you will cause performance regressions and they are hard
> to debug.

I actually meant the TX NAPI mode, we tried to hold the TX lock in the
TX NAPI, which turns out to slow down both the transmission and the
NAPI itself.

Thanks


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ