lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 31 Oct 2018 11:30:27 +0000
From:   Tariq Toukan <tariqt@...lanox.com>
To:     Eric Dumazet <edumazet@...gle.com>,
        "David S . Miller" <davem@...emloft.net>
CC:     netdev <netdev@...r.kernel.org>,
        Willem de Bruijn <willemb@...gle.com>,
        Tariq Toukan <tariqt@...lanox.com>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH net 1/3] net: bql: add netdev_tx_sent_queue_more() helper



On 30/10/2018 1:25 AM, Eric Dumazet wrote:
> When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
> of packets, GSO can later transform this list in another list
> of skbs, and each skb is sent to device ndo_start_xmit(),
> one at a time, with skb->xmit_more being set to one but
> for last skb.
> 
> Problem is that very often, BQL limit is hit in the middle of
> the packet train, forcing dev_hard_start_xmit() to stop the
> bulk send and requeue the end of the list.
> 
> BQL role is to avoid head of line blocking, making sure
> a qdisc can deliver high priority packets before low priority ones.
> 
> But there is no way requeued packets can be bypassed by fresh
> packets in the qdisc.
> 
> Aborting the bulk send increases TX softirqs, and hot cache
> lines (after skb_segment()) are wasted.
> 
> Note that for TSO packets, we never split a packet in the middle
> because of BQL limit being hit.
> 
> Drivers should be able to update BQL counters without
> flipping/caring about BQL status, if the current skb
> has xmit_more set.
> 
> Upper layers are ultimately responsible to stop sending another
> packet train when BQL limit is hit.
> 
> Code template in a driver might look like the following :
> 
> 	if (skb->xmit_more) {
> 		netdev_tx_sent_queue_more(tx_queue, nr_bytes);
> 		send_doorbell = netif_tx_queue_stopped(tx_queue);
> 	} else {
> 		netdev_tx_sent_queue(tx_queue, nr_bytes);
> 		send_doorbell = true;
> 	}
> 

Hi Eric,
Nice optimization.

I thought of another way of implementing it, by just extending the 
existing netdev_tx_sent_queue function with a new xmit_more parameter, 
that the driver passes.
Something like this:

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d837dad24b4c..feb9cbcb5759 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3129,12 +3129,12 @@ static inline void 
netdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu
  }

  static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
-                                       unsigned int bytes)
+                                       unsigned int bytes, bool more)
  {
  #ifdef CONFIG_BQL
         dql_queued(&dev_queue->dql, bytes);

-       if (likely(dql_avail(&dev_queue->dql) >= 0))
+       if (more || likely(dql_avail(&dev_queue->dql) >= 0))
                 return;

         set_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state);


This unifies and simplifies both the stack and driver code, as the new 
suggested function netdev_tx_sent_queue_more can become a private case 
of the existing one.
This would, however, require a one-time maintenance of 31 existing 
usages of the function:
$ git grep netdev_tx_sent_queue drivers/net/ethernet/ | wc -l
31

What do you think?

> Note that netdev_tx_sent_queue_more() use is not mandatory,
> since following patch will change dev_hard_start_xmit()
> to not care about BQL status.

OK so changing the driver code according to the suggested here becomes 
safe starting next patch, and you do it only in patch 3, so it's fine.

> 
> But it is higly recommended so that xmit_more full benefits

typo: highly, just in case you re-spin.

> can be reached (less doorbells sent, and less atomic operations as well)
> 
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
>   include/linux/netdevice.h | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index dc1d9ed33b3192e9406b17c3107b3235b28ff1b9..beb37232688f7e4a71c932e472454e94df18b865 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3166,6 +3166,18 @@ static inline void netdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu
>   #endif
>   }
>   
> +/* Variant of netdev_tx_sent_queue() for packets with xmit_more.
> + * We do want to change __QUEUE_STATE_STACK_XOFF only for the last
> + * skb of a batch.
> + */
> +static inline void netdev_tx_sent_queue_more(struct netdev_queue *dev_queue,
> +					     unsigned int bytes)
> +{
> +#ifdef CONFIG_BQL
> +	dql_queued(&dev_queue->dql, bytes);
> +#endif
> +}
> +
>   static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
>   					unsigned int bytes)
>   {
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ