[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1409837383.26422.113.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Thu, 04 Sep 2014 06:29:43 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Tom Herbert <therbert@...gle.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
Florian Westphal <fw@...len.de>,
Daniel Borkmann <dborkman@...hat.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Alexander Duyck <alexander.duyck@...il.com>,
John Fastabend <john.r.fastabend@...el.com>
Subject: Re: [RFC net-next PATCH V2 2/3] qdisc: bulk dequeue support for
qdiscs with TCQ_F_ONETXQUEUE
On Thu, 2014-09-04 at 14:55 +0200, Jesper Dangaard Brouer wrote:
> Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
> sending/processing an entire skb list.
>
> This patch implements qdisc bulk dequeue, by allowing multiple packets
> to be dequeued in dequeue_skb().
>
> The optimization principle for this is two fold, (1) to amortize
> locking cost and (2) avoid expensive tailptr update for notifying HW.
> (1) Several packets are dequeued while holding the qdisc root_lock,
> amortizing locking cost over several packet. The dequeued SKB list is
> processed under the TXQ lock in dev_hard_start_xmit(), thus also
> amortizing the cost of the TXQ lock.
> (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
> API to delay HW tailptr update, which also reduces the cost per
> packet.
>
> One restriction of the new API is that every SKB must belong to the
> same TXQ. This patch takes the easy way out, by restricting bulk
> dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
> qdisc only have attached a single TXQ.
>
> Some detail about the flow; dev_hard_start_xmit() will process the skb
> list, and transmit packets individually towards the driver (see
> xmit_one()). In case the driver stops midway in the list, the
> remaining skb list is returned by dev_hard_start_xmit(). In
> sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
>
> The patch also tries to limit the amount of bytes dequeued, based on
> the drivers BQL limits. It also tries to avoid and stop dequeuing
> when seeing a GSO packet (both real GSO and segmented GSO skb lists).
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
>
> ---
> V2:
> - Restruct functions, split out functionality
> - Use BQL bytelimit to avoid overshooting driver limits, causing
> too large skb lists to be sitting on the requeue gso_skb.
>
> net/sched/sch_generic.c | 67 +++++++++++++++++++++++++++++++++++++++++++++--
> 1 files changed, 64 insertions(+), 3 deletions(-)
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 19696eb..a0c8070 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -56,6 +56,67 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
> return 0;
> }
>
> +static inline bool qdisc_may_bulk(const struct Qdisc *qdisc,
> + const struct sk_buff *skb)
Why skb is passed here ?
> +{
> + return (qdisc->flags & TCQ_F_ONETXQUEUE);
return qdisc->flags & TCQ_F_ONETXQUEUE;
> +}
> +
> +static inline struct sk_buff *qdisc_dequeue_validate(struct Qdisc *qdisc)
> +{
> + struct sk_buff *skb = qdisc->dequeue(qdisc);
> +
> + if (skb != NULL)
> + skb = validate_xmit_skb(skb, qdisc_dev(qdisc));
> +
> + return skb;
> +}
> +
> +static inline struct sk_buff *qdisc_bulk_dequeue_skb(struct Qdisc *q,
> + struct sk_buff *head)
> +{
> + struct sk_buff *new, *skb = head;
> + struct netdev_queue *txq = q->dev_queue;
> + int bytelimit = netdev_tx_avail_queue(txq);
> + int limit = 5;
> +
> + if (bytelimit <= 0)
> + return head;
> +
> + do {
> + if (skb->next || skb_is_gso(skb)) {
A GSO packet should not 'stop the processing' if the device is TSO
capable : It should look as a normal packet.
> + /* Stop processing if the skb is already a skb
> + * list (e.g a segmented GSO packet) or a real
> + * GSO packet */
> + break;
> + }
> + new = qdisc_dequeue_validate(q);
Thats broken.
After validate_xmit_skb() @new might be the head of a list. (GSO split)
> + if (new) {
> + skb->next = new;
> + skb = new;
and new is only the first element on the list.
> + bytelimit -= skb->len;
skb->len is the length of first segment.
> + cnt++;
> + /* One problem here is it is difficult to
> + * requeue the "new" dequeued skb, e.g. in
> + * case of GSO, thus a "normal" packet can
> + * have a GSO packet on its ->next ptr.
> + *
> + * Requeue is difficult because if requeuing
> + * on q->gso_skb, then a second requeue can
> + * happen from sch_direct_xmit e.g. if driver
> + * returns NETDEV_TX_BUSY, which would
> + * overwrite this requeue.
> + */
It should not overwrite, but insert back. Thats what need to be done.
> + }
> + } while (new && --limit && (bytelimit > 0));
> + skb = head;
> +
> + return skb;
> +}
> +
Idea is really the following :
Once qdisc dequeue bulk is done, and skb validated, we have a list of
skb to send to the device.
We iterate the list and try to send the individual skb.
As soon as device returns NETDEV_TX_BUSY (or even better, when the queue
is stopped), we requeue the remaining list.
BQL never tried to find the exact split point :
If available budget is 25000 bytes, and next TSO packet is 45000 bytes,
we send it.
So the bql validation should be done before the eventual
validate_xmit_skb() : This way you dont care of GSO or TSO.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists