netdev - Re: [PATCH net 1/3] net: bql: add netdev_tx_sent_queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKH1eqEhrmOA74TiGehwt58+pNj7hYosYhZ4JO1tw6HMQ@mail.gmail.com>
Date:   Wed, 31 Oct 2018 06:53:58 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Tariq Toukan <tariqt@...lanox.com>
Cc:     David Miller <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Willem de Bruijn <willemb@...gle.com>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH net 1/3] net: bql: add netdev_tx_sent_queue_more() helper

On Wed, Oct 31, 2018 at 4:30 AM Tariq Toukan <tariqt@...lanox.com> wrote:
>
>
>
> On 30/10/2018 1:25 AM, Eric Dumazet wrote:
> > When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
> > of packets, GSO can later transform this list in another list
> > of skbs, and each skb is sent to device ndo_start_xmit(),
> > one at a time, with skb->xmit_more being set to one but
> > for last skb.
> >
> > Problem is that very often, BQL limit is hit in the middle of
> > the packet train, forcing dev_hard_start_xmit() to stop the
> > bulk send and requeue the end of the list.
> >
> > BQL role is to avoid head of line blocking, making sure
> > a qdisc can deliver high priority packets before low priority ones.
> >
> > But there is no way requeued packets can be bypassed by fresh
> > packets in the qdisc.
> >
> > Aborting the bulk send increases TX softirqs, and hot cache
> > lines (after skb_segment()) are wasted.
> >
> > Note that for TSO packets, we never split a packet in the middle
> > because of BQL limit being hit.
> >
> > Drivers should be able to update BQL counters without
> > flipping/caring about BQL status, if the current skb
> > has xmit_more set.
> >
> > Upper layers are ultimately responsible to stop sending another
> > packet train when BQL limit is hit.
> >
> > Code template in a driver might look like the following :
> >
> >       if (skb->xmit_more) {
> >               netdev_tx_sent_queue_more(tx_queue, nr_bytes);
> >               send_doorbell = netif_tx_queue_stopped(tx_queue);
> >       } else {
> >               netdev_tx_sent_queue(tx_queue, nr_bytes);
> >               send_doorbell = true;
> >       }
> >
>
> Hi Eric,
> Nice optimization.
>
> I thought of another way of implementing it, by just extending the
> existing netdev_tx_sent_queue function with a new xmit_more parameter,
> that the driver passes.
> Something like this:
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index d837dad24b4c..feb9cbcb5759 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3129,12 +3129,12 @@ static inline void
> netdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu
>   }
>
>   static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
> -                                       unsigned int bytes)
> +                                       unsigned int bytes, bool more)
>   {
>   #ifdef CONFIG_BQL
>          dql_queued(&dev_queue->dql, bytes);
>
> -       if (likely(dql_avail(&dev_queue->dql) >= 0))
> +       if (more || likely(dql_avail(&dev_queue->dql) >= 0))
>                  return;
>
>          set_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state);
>
>
> This unifies and simplifies both the stack and driver code, as the new
> suggested function netdev_tx_sent_queue_more can become a private case
> of the existing one.

Yes I thought of this, but it is too risky/invasive and I prefer
adding an opt-in mechanism
so that patch authors can test their patch.

Then when/if all drivers are updated, we can remove the legacy stuff
or rename everything.

mlx4 was the only NIC I could reasonably test myself.

Another suggestion from Willem is to add the following helper, returning
a boolean (doorbell needed)

+static inline bool __netdev_tx_sent_queue(struct netdev_queue *dev_queue,
+                                         unsigned int bytes, bool xmit_more)
+{
+       if (xmit_more) {
+               netdev_tx_sent_queue_more(ring->tx_queue, tx_info->nr_bytes);
+               return netif_tx_queue_stopped(dev_queue);
+       } else {
+               netdev_tx_sent_queue(dev_queue, bytes);
+               return true;
+       }
+}
+