[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140924195831.6fb91051@redhat.com>
Date: Wed, 24 Sep 2014 19:58:31 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: netdev@...r.kernel.org, therbert@...gle.com,
"David S. Miller" <davem@...emloft.net>,
Alexander Duyck <alexander.h.duyck@...el.com>, toke@...e.dk,
Florian Westphal <fw@...len.de>, jhs@...atatu.com,
Dave Taht <dave.taht@...il.com>,
John Fastabend <john.r.fastabend@...el.com>,
Daniel Borkmann <dborkman@...hat.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
brouer@...hat.com
Subject: Re: [net-next PATCH 1/1 V4] qdisc: bulk dequeue support for qdiscs
with TCQ_F_ONETXQUEUE
On Wed, 24 Sep 2014 10:23:15 -0700
Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Wed, 2014-09-24 at 18:12 +0200, Jesper Dangaard Brouer wrote:
> > Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
> > sending/processing an entire skb list.
> >
> > This patch implements qdisc bulk dequeue, by allowing multiple packets
> > to be dequeued in dequeue_skb().
> >
> > The optimization principle for this is two fold, (1) to amortize
> > locking cost and (2) avoid expensive tailptr update for notifying HW.
> > (1) Several packets are dequeued while holding the qdisc root_lock,
> > amortizing locking cost over several packet. The dequeued SKB list is
> > processed under the TXQ lock in dev_hard_start_xmit(), thus also
> > amortizing the cost of the TXQ lock.
> > (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
> > API to delay HW tailptr update, which also reduces the cost per
> > packet.
> >
> > One restriction of the new API is that every SKB must belong to the
> > same TXQ. This patch takes the easy way out, by restricting bulk
> > dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
> > qdisc only have attached a single TXQ.
> >
> > Some detail about the flow; dev_hard_start_xmit() will process the skb
> > list, and transmit packets individually towards the driver (see
> > xmit_one()). In case the driver stops midway in the list, the
> > remaining skb list is returned by dev_hard_start_xmit(). In
> > sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
> >
> > To avoid overshooting the HW limits, which results in requeuing, the
> > patch limits the amount of bytes dequeued, based on the drivers BQL
> > limits. In-effect bulking will only happen for BQL enabled drivers.
> > Besides the bytelimit from BQL, also limit bulking to maximum 8
> > packets to avoid any issues with available descriptor in HW.
> >
> > For now, as a conservative approach, don't bulk dequeue GSO and
> > segmented GSO packets, because testing showed requeuing occuring
> > with segmented GSO packets.
> >
> > Jointed work with Hannes, Daniel and Florian.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@...essinduktion.org>
> > Signed-off-by: Daniel Borkmann <dborkman@...hat.com>
> > Signed-off-by: Florian Westphal <fw@...len.de>
> >
> > ---
> > V4:
> > - Patch rewritten in the Red Hat Neuchatel office jointed work with
> > Hannes, Daniel and Florian.
> > - Conservative approach of only using on BQL enabled drivers
> > - No user tunable parameter, but limit bulking to 8 packets.
> > - For now, avoid bulking GSO packets packets.
> >
[...]
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > index 19696eb..6fba089 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -56,6 +56,42 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
> > return 0;
> > }
> >
> > +static struct sk_buff *try_bulk_dequeue_skb(struct Qdisc *q,
> > + struct sk_buff *head_skb,
> > + int bytelimit)
> > +{
> > + struct sk_buff *skb, *tail_skb = head_skb;
> > + int budget = 8; /* Arbitrary, but conservatively choosen limit */
> > +
> > + while (bytelimit > 0 && --budget > 0) {
> > + /* For now, don't bulk dequeue GSO (or GSO segmented) pkts */
>
> Hmm... this is a serious limitation.
We plan to remove this at a later point, this is just to be conservative.
> > + if (tail_skb->next || skb_is_gso(tail_skb))
> > + break;
> > +
> > + skb = q->dequeue(q);
> > + if (!skb)
> > + break;
> > +
> > + bytelimit -= skb->len; /* covers GSO len */
>
> Not really, use qdisc_pkt_len(skb) instead, to have a better byte count.
Understood, will update patch tomorrow.
> > + skb = validate_xmit_skb(skb, qdisc_dev(q));
> > + if (!skb)
> > + break;
> > +
> > + /* "skb" can be a skb list after validate call above
> > + * (GSO segmented), but it is okay to append it to
> > + * current tail_skb->next, because next round will exit
> > + * in-case "tail_skb->next" is a skb list.
> > + */
>
> It would be trivial to change validate_xmit_skb() to return the head and
> tail of the chain. Walking the chain would hit hot cache lines, but it
> is better not having to walk it.
Yes, we could do this when we later add support for GSO segmented packets.
> > + tail_skb->next = skb;
> > + tail_skb = skb;
>
> So that here we do : tail_skb = tail;
>
> > + }
> > +
> > + return head_skb;
> > +}
> > +
> > +/* Note that dequeue_skb can possibly return a SKB list (via skb->next).
> > + * A requeued skb (via q->gso_skb) can also be a SKB list.
> > + */
> > static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
> > {
> > struct sk_buff *skb = q->gso_skb;
> > @@ -70,10 +106,17 @@ static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
> > } else
> > skb = NULL;
> > } else {
> > - if (!(q->flags & TCQ_F_ONETXQUEUE) || !netif_xmit_frozen_or_stopped(txq)) {
> > + if (!(q->flags & TCQ_F_ONETXQUEUE) ||
> > + !netif_xmit_frozen_or_stopped(txq)) {
> > + int bytelimit = qdisc_avail_bulklimit(txq);
> > +
> > skb = q->dequeue(q);
> > - if (skb)
> > + if (skb) {
> > + bytelimit -= skb->len;
>
> qdisc_pkt_len(skb)
Okay, will update patch tomorrow.
> > skb = validate_xmit_skb(skb, qdisc_dev(q));
> > + }
> > + if (skb && qdisc_may_bulk(q))
> > + skb = try_bulk_dequeue_skb(q, skb, bytelimit);
> > }
> > }
> >
>
> It looks good, but we really need to take care of TSO packets.
As notes above, TSO packets are planned as a future feature. Lets
first see if this works well, without introducing any HoL blocking
issues or other regressions.
> pktgen is nice, but do not represent the majority of the traffic we send
> from high performance host where we want this bulk dequeue thing ;)
This patch is actually targetted towards more normal use-cases.
Pktgen cannot even use this work, as it bypass the qdisc layer...
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists