[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKH1eqEhrmOA74TiGehwt58+pNj7hYosYhZ4JO1tw6HMQ@mail.gmail.com>
Date: Wed, 31 Oct 2018 06:53:58 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Tariq Toukan <tariqt@...lanox.com>
Cc: David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Willem de Bruijn <willemb@...gle.com>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH net 1/3] net: bql: add netdev_tx_sent_queue_more() helper
On Wed, Oct 31, 2018 at 4:30 AM Tariq Toukan <tariqt@...lanox.com> wrote:
>
>
>
> On 30/10/2018 1:25 AM, Eric Dumazet wrote:
> > When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
> > of packets, GSO can later transform this list in another list
> > of skbs, and each skb is sent to device ndo_start_xmit(),
> > one at a time, with skb->xmit_more being set to one but
> > for last skb.
> >
> > Problem is that very often, BQL limit is hit in the middle of
> > the packet train, forcing dev_hard_start_xmit() to stop the
> > bulk send and requeue the end of the list.
> >
> > BQL role is to avoid head of line blocking, making sure
> > a qdisc can deliver high priority packets before low priority ones.
> >
> > But there is no way requeued packets can be bypassed by fresh
> > packets in the qdisc.
> >
> > Aborting the bulk send increases TX softirqs, and hot cache
> > lines (after skb_segment()) are wasted.
> >
> > Note that for TSO packets, we never split a packet in the middle
> > because of BQL limit being hit.
> >
> > Drivers should be able to update BQL counters without
> > flipping/caring about BQL status, if the current skb
> > has xmit_more set.
> >
> > Upper layers are ultimately responsible to stop sending another
> > packet train when BQL limit is hit.
> >
> > Code template in a driver might look like the following :
> >
> > if (skb->xmit_more) {
> > netdev_tx_sent_queue_more(tx_queue, nr_bytes);
> > send_doorbell = netif_tx_queue_stopped(tx_queue);
> > } else {
> > netdev_tx_sent_queue(tx_queue, nr_bytes);
> > send_doorbell = true;
> > }
> >
>
> Hi Eric,
> Nice optimization.
>
> I thought of another way of implementing it, by just extending the
> existing netdev_tx_sent_queue function with a new xmit_more parameter,
> that the driver passes.
> Something like this:
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index d837dad24b4c..feb9cbcb5759 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3129,12 +3129,12 @@ static inline void
> netdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu
> }
>
> static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
> - unsigned int bytes)
> + unsigned int bytes, bool more)
> {
> #ifdef CONFIG_BQL
> dql_queued(&dev_queue->dql, bytes);
>
> - if (likely(dql_avail(&dev_queue->dql) >= 0))
> + if (more || likely(dql_avail(&dev_queue->dql) >= 0))
> return;
>
> set_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state);
>
>
> This unifies and simplifies both the stack and driver code, as the new
> suggested function netdev_tx_sent_queue_more can become a private case
> of the existing one.
Yes I thought of this, but it is too risky/invasive and I prefer
adding an opt-in mechanism
so that patch authors can test their patch.
Then when/if all drivers are updated, we can remove the legacy stuff
or rename everything.
mlx4 was the only NIC I could reasonably test myself.
Another suggestion from Willem is to add the following helper, returning
a boolean (doorbell needed)
+static inline bool __netdev_tx_sent_queue(struct netdev_queue *dev_queue,
+ unsigned int bytes, bool xmit_more)
+{
+ if (xmit_more) {
+ netdev_tx_sent_queue_more(ring->tx_queue, tx_info->nr_bytes);
+ return netif_tx_queue_stopped(dev_queue);
+ } else {
+ netdev_tx_sent_queue(dev_queue, bytes);
+ return true;
+ }
+}
+
Powered by blists - more mailing lists