lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 13 Dec 2022 10:15:52 -0500
From:   "Michael S. Tsirkin" <mst@...hat.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, edumazet@...gle.com,
        kuba@...nel.org, pabeni@...hat.com, davem@...emloft.net
Subject: Re: [PATCH net] virtio-net: correctly enable callback during
 start_xmit

On Tue, Dec 13, 2022 at 02:57:54PM +0800, Jason Wang wrote:
> On Tue, Dec 13, 2022 at 2:38 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> >
> > On Tue, Dec 13, 2022 at 11:43:36AM +0800, Jason Wang wrote:
> > > On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@...ux.alibaba.com> wrote:
> > > >
> > > > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@...hat.com> wrote:
> > > > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > > > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > > > > > virtqueue callback via the following statement:
> > > > > >
> > > > > >         do {
> > > > > >            ......
> > > > > >     } while (use_napi && kick &&
> > > > > >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > >
> > > > > > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > > > > > kick is false. Fixing this by removing the checking of the kick from
> > > > > > the condition to make sure callback is enabled correctly.
> > > > > >
> > > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > > > > > Signed-off-by: Jason Wang <jasowang@...hat.com>
> > > > > > ---
> > > > > > The patch is needed for -stable.
> > > > >
> > > > > stable rules don't allow for theoretical fixes. Was a problem observed?
> > >
> > > Yes, running a pktgen sample script can lead to a tx timeout.
> >
> > Since April 2021 and we only noticed now? Are you sure it's the
> > right Fixes tag?
> 
> Well, reverting a7766ef18b33 makes pktgen work again.
> 
> The reason we doesn't notice is probably because:
> 
> 1) We don't support BQL, so no bulk dequeuing (skb list) in normal traffic
> 2) When burst is enabled for pktgen, it can do bulk xmit via skb list by its own
> 
> >
> > > > >
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 4 ++--
> > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 86e52454b5b5..44d7daf0267b 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >
> > > > > >             free_old_xmit_skbs(sq, false);
> > > > > >
> > > > > > -   } while (use_napi && kick &&
> > > > > > -          unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > > +   } while (use_napi &&
> > > > > > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > >
> > > > >
> > > > > A bit more explanation pls.  kick simply means !netdev_xmit_more -
> > > > > if it's false we know there will be another packet, then transmissing
> > > > > that packet will invoke virtqueue_enable_cb_delayed. No?
> > > >
> > > > It's just that there may be a next packet, but in fact there may not be.
> > > > For example, the vq is full, and the driver stops the queue.
> > >
> > > Exactly, when the queue is about to be full we disable tx and wait for
> > > the next tx interrupt to re-enable tx.
> > >
> > > Thanks
> >
> > OK, it's a good idea to document that.
> 
> Will do.
> 
> > And we should enable callbacks at that point, not here on data path.
> 
> I'm not sure I understand here. Are you suggesting removing the
> !user_napi check here?
> 
>                 if (!use_napi &&
>                     unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>                         /* More just got used, free them then recheck. */
>                         free_old_xmit_skbs(sq, false);
>                         if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>                                 netif_start_subqueue(dev, qnum);
>                                 virtqueue_disable_cb(sq->vq);
>                         }
>                 }


At least, I suggest calling virtqueue_enable_cb_delayed around
this area of code. I have not really thought all this path through
and how all the corner cases interact.



> Btw, it doesn't differ too much as kick is always true without pktgen
> and that may even need more comments or make the code even harder to
> read. We need a patch for -stable at least so I prefer to let this
> patch go first and do optimization on top.
> 
> Thanks

There's a chance of perf regression here too.  Let's write the full
patch first of all. If you want to make it a 2 patch series that is fine
but it is here since 2021 I don't see why we should rush a fix. Worry
about backporting later.

> >
> >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >     /* timestamp packet in software */
> > > > > >     skb_tx_timestamp(skb);
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > > > > _______________________________________________
> > > > > Virtualization mailing list
> > > > > Virtualization@...ts.linux-foundation.org
> > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > > >
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ