netdev - Re: [RFC net-next PATCH V3 2/2] qdisc: bulk dequeue support for qdiscs with TCQ_F

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+mtBx8guwOH7uoxA8Jt-r8s8FYWj=oOp28AodrVW4FhDeagZA@mail.gmail.com>
Date:	Fri, 19 Sep 2014 17:22:14 -0700
From:	Tom Herbert <therbert@...gle.com>
To:	Jesper Dangaard Brouer <brouer@...hat.com>
Cc:	Linux Netdev List <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	Florian Westphal <fw@...len.de>,
	Daniel Borkmann <dborkman@...hat.com>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	Alexander Duyck <alexander.duyck@...il.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	Dave Taht <dave.taht@...il.com>,
	Toke Høiland-Jørgensen <toke@...e.dk>
Subject: Re: [RFC net-next PATCH V3 2/2] qdisc: bulk dequeue support for
 qdiscs with TCQ_F_ONETXQUEUE

On Fri, Sep 19, 2014 at 1:49 PM, Jesper Dangaard Brouer
<brouer@...hat.com> wrote:
> Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
> sending/processing an entire skb list.
>
> This patch implements qdisc bulk dequeue, by allowing multiple packets
> to be dequeued in dequeue_skb().
>
> The optimization principle for this is two fold, (1) to amortize
> locking cost and (2) avoid expensive tailptr update for notifying HW.
>  (1) Several packets are dequeued while holding the qdisc root_lock,
> amortizing locking cost over several packet.  The dequeued SKB list is
> processed under the TXQ lock in dev_hard_start_xmit(), thus also
> amortizing the cost of the TXQ lock.
>  (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
> API to delay HW tailptr update, which also reduces the cost per
> packet.
>
> One restriction of the new API is that every SKB must belong to the
> same TXQ.  This patch takes the easy way out, by restricting bulk
> dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
> qdisc only have attached a single TXQ.
>
> Some detail about the flow; dev_hard_start_xmit() will process the skb
> list, and transmit packets individually towards the driver (see
> xmit_one()).  In case the driver stops midway in the list, the
> remaining skb list is returned by dev_hard_start_xmit().  In
> sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
>
> To avoid overshooting the HW limits, which results in requeuing, the
> patch limits the amount of bytes dequeued, based on the drivers BQL
> limits.
>
> It also see GSO and segmented GSO packets, as a seperate kind of SKB
> bulking, and thus tries to avoid and stop dequeuing when seeing a GSO
> packet (both real GSO and segmented GSO skb lists).
>
> Choosing a very conservative initial default value of 1 extra packet
> dequeue, if bulking is allowed.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
>
> ---
> V2:
>  - Restruct functions, split out functionality
>  - Use BQL bytelimit to avoid overshooting driver limits
>
> V3:
>  - Correct use of BQL
>  - Some minor adjustments based on feedback.
>  - Default setting only bulk dequeue 1 extra packet (2 packets).
>
>  include/net/sch_generic.h  |    2 +
>  net/core/sysctl_net_core.c |    9 ++++++
>  net/sched/sch_generic.c    |   70 ++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 78 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 1e89b9a..da9324f 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -14,6 +14,8 @@ struct qdisc_walker;
>  struct tcf_walker;
>  struct module;
>
> +extern int qdisc_bulk_dequeue_limit;
> +
>  struct qdisc_rate_table {
>         struct tc_ratespec rate;
>         u32             data[256];
> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index cf9cd13..5505841 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c
> @@ -361,6 +361,15 @@ static struct ctl_table net_core_table[] = {
>                 .mode           = 0644,
>                 .proc_handler   = proc_dointvec
>         },
> +       {
> +               .procname       = "qdisc_bulk_dequeue_limit",
> +               .data           = &qdisc_bulk_dequeue_limit,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .extra1         = &zero,
> +               .extra2         = &ushort_max,
> +               .proc_handler   = proc_dointvec_minmax
> +       },
>         { }
>  };
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 346ef85..eee2280 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -34,6 +34,9 @@
>  const struct Qdisc_ops *default_qdisc_ops = &pfifo_fast_ops;
>  EXPORT_SYMBOL(default_qdisc_ops);
>
> +// FIXME: Still undecided where to put this parameter...
> +int qdisc_bulk_dequeue_limit __read_mostly = 1;
> +
>  /* Main transmission queue. */
>
>  /* Modifications to data participating in scheduling must be protected with
> @@ -56,6 +59,67 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>         return 0;
>  }
>
> +static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
> +{
> +       return qdisc->flags & TCQ_F_ONETXQUEUE;
> +}
> +
> +static inline struct sk_buff *qdisc_dequeue_validate(struct Qdisc *qdisc)
> +{
> +       struct sk_buff *skb = qdisc->dequeue(qdisc);
> +
> +       if (skb != NULL)
> +               skb = validate_xmit_skb(skb, qdisc_dev(qdisc));
> +
> +       return skb;
> +}
> +
> +static inline struct sk_buff *qdisc_bulk_dequeue_skb(struct Qdisc *q,
> +                                                    struct sk_buff *head)
> +{
> +       struct sk_buff *new, *skb = head;
> +       struct netdev_queue *txq = q->dev_queue;
> +       int bytelimit = netdev_tx_avail_queue(txq);
> +       int limit = qdisc_bulk_dequeue_limit;
> +
> +       bytelimit -= skb->len; /* incorrect len if skb->next, but exits below */
> +
> +       // if (bytelimit < psched_mtu(qdisc_dev(q)) //proposed by fw
> +       if (bytelimit < 0)
> +               return head;
> +
> +       do {
> +               if (skb->next || skb_is_gso(skb)) {
> +                       /* Stop processing if the skb is already a skb
> +                        * list (e.g a segmented GSO packet, from
> +                        * below or func caller) or a real GSO packet
> +                        */
> +                       break;
> +               }
> +               new = q->dequeue(q);
> +               if (new) {
> +                       bytelimit -= new->len; /* covers GSO len */
> +                       new = validate_xmit_skb(new, qdisc_dev(q));
> +                       if (!new)
> +                               break;
> +                       /* "new" can be a skb list after validate call
> +                        * above (GSO segmented), but it is okay to
> +                        * append it to current skb->next, because
> +                        * next round will exit in-case "new" were a
> +                        * skb list.
> +                        */
> +                       skb->next = new;
> +                       skb = new;
> +               }
> +       } while (new && --limit && (bytelimit >= 0));
> +       skb = head;
> +
> +       return skb;
> +}
> +
> +/* Note that dequeue_skb can possibly return a SKB list (via skb->next).
> + * A requeued skb (via q->gso_skb) can also be a SKB list.
> + */
>  static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
>  {
>         struct sk_buff *skb = q->gso_skb;
> @@ -71,9 +135,9 @@ static inline struct sk_buff *dequeue_skb(struct Qdisc *q)
>                         skb = NULL;
>         } else {
>                 if (!(q->flags & TCQ_F_ONETXQUEUE) || !netif_xmit_frozen_or_stopped(txq)) {
> -                       skb = q->dequeue(q);
> -                       if (skb)
> -                               skb = validate_xmit_skb(skb, qdisc_dev(q));
> +                       skb = qdisc_dequeue_validate(q);
> +                       if (skb && qdisc_may_bulk(q) && qdisc_bulk_dequeue_limit)
> +                               skb = qdisc_bulk_dequeue_skb(q, skb);
>                 }
This is hard to read. I don't understand why qdisc_bulk_dequeue_skb
needs to be a separate function. Single dequeue and bulk dequeue can
happen in one common loop. Maybe something like:

struct sk_buff *ret = NULL, **next = &ret;

bytelimit = qdisc_may_bulk(q) ? netdev_tx_avail_queue(txq) : 0;

do {
    skb= q->dequeue;
    if (!skb)
        break;
    *next = skb;
    next = &skb->next;
    bytelimit -= skb->len;
    limit--;
} while (bytelimit > 0 && limit > 0);

>         }
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html