[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c6142e34-5437-2625-069c-218a91440f80@mellanox.com>
Date: Wed, 31 Oct 2018 11:30:27 +0000
From: Tariq Toukan <tariqt@...lanox.com>
To: Eric Dumazet <edumazet@...gle.com>,
"David S . Miller" <davem@...emloft.net>
CC: netdev <netdev@...r.kernel.org>,
Willem de Bruijn <willemb@...gle.com>,
Tariq Toukan <tariqt@...lanox.com>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH net 1/3] net: bql: add netdev_tx_sent_queue_more() helper
On 30/10/2018 1:25 AM, Eric Dumazet wrote:
> When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
> of packets, GSO can later transform this list in another list
> of skbs, and each skb is sent to device ndo_start_xmit(),
> one at a time, with skb->xmit_more being set to one but
> for last skb.
>
> Problem is that very often, BQL limit is hit in the middle of
> the packet train, forcing dev_hard_start_xmit() to stop the
> bulk send and requeue the end of the list.
>
> BQL role is to avoid head of line blocking, making sure
> a qdisc can deliver high priority packets before low priority ones.
>
> But there is no way requeued packets can be bypassed by fresh
> packets in the qdisc.
>
> Aborting the bulk send increases TX softirqs, and hot cache
> lines (after skb_segment()) are wasted.
>
> Note that for TSO packets, we never split a packet in the middle
> because of BQL limit being hit.
>
> Drivers should be able to update BQL counters without
> flipping/caring about BQL status, if the current skb
> has xmit_more set.
>
> Upper layers are ultimately responsible to stop sending another
> packet train when BQL limit is hit.
>
> Code template in a driver might look like the following :
>
> if (skb->xmit_more) {
> netdev_tx_sent_queue_more(tx_queue, nr_bytes);
> send_doorbell = netif_tx_queue_stopped(tx_queue);
> } else {
> netdev_tx_sent_queue(tx_queue, nr_bytes);
> send_doorbell = true;
> }
>
Hi Eric,
Nice optimization.
I thought of another way of implementing it, by just extending the
existing netdev_tx_sent_queue function with a new xmit_more parameter,
that the driver passes.
Something like this:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d837dad24b4c..feb9cbcb5759 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3129,12 +3129,12 @@ static inline void
netdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu
}
static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
- unsigned int bytes)
+ unsigned int bytes, bool more)
{
#ifdef CONFIG_BQL
dql_queued(&dev_queue->dql, bytes);
- if (likely(dql_avail(&dev_queue->dql) >= 0))
+ if (more || likely(dql_avail(&dev_queue->dql) >= 0))
return;
set_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state);
This unifies and simplifies both the stack and driver code, as the new
suggested function netdev_tx_sent_queue_more can become a private case
of the existing one.
This would, however, require a one-time maintenance of 31 existing
usages of the function:
$ git grep netdev_tx_sent_queue drivers/net/ethernet/ | wc -l
31
What do you think?
> Note that netdev_tx_sent_queue_more() use is not mandatory,
> since following patch will change dev_hard_start_xmit()
> to not care about BQL status.
OK so changing the driver code according to the suggested here becomes
safe starting next patch, and you do it only in patch 3, so it's fine.
>
> But it is higly recommended so that xmit_more full benefits
typo: highly, just in case you re-spin.
> can be reached (less doorbells sent, and less atomic operations as well)
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
> include/linux/netdevice.h | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index dc1d9ed33b3192e9406b17c3107b3235b28ff1b9..beb37232688f7e4a71c932e472454e94df18b865 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3166,6 +3166,18 @@ static inline void netdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu
> #endif
> }
>
> +/* Variant of netdev_tx_sent_queue() for packets with xmit_more.
> + * We do want to change __QUEUE_STATE_STACK_XOFF only for the last
> + * skb of a batch.
> + */
> +static inline void netdev_tx_sent_queue_more(struct netdev_queue *dev_queue,
> + unsigned int bytes)
> +{
> +#ifdef CONFIG_BQL
> + dql_queued(&dev_queue->dql, bytes);
> +#endif
> +}
> +
> static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
> unsigned int bytes)
> {
>
Powered by blists - more mailing lists