lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 04 Sep 2014 06:29:43 -0700
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Jesper Dangaard Brouer <brouer@...hat.com>
Cc:	netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
	Tom Herbert <therbert@...gle.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	Florian Westphal <fw@...len.de>,
	Daniel Borkmann <dborkman@...hat.com>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	Alexander Duyck <alexander.duyck@...il.com>,
	John Fastabend <john.r.fastabend@...el.com>
Subject: Re: [RFC net-next PATCH V2 2/3] qdisc: bulk dequeue support for
 qdiscs with TCQ_F_ONETXQUEUE

On Thu, 2014-09-04 at 14:55 +0200, Jesper Dangaard Brouer wrote:
> Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
> sending/processing an entire skb list.
> 
> This patch implements qdisc bulk dequeue, by allowing multiple packets
> to be dequeued in dequeue_skb().
> 
> The optimization principle for this is two fold, (1) to amortize
> locking cost and (2) avoid expensive tailptr update for notifying HW.
>  (1) Several packets are dequeued while holding the qdisc root_lock,
> amortizing locking cost over several packet.  The dequeued SKB list is
> processed under the TXQ lock in dev_hard_start_xmit(), thus also
> amortizing the cost of the TXQ lock.
>  (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
> API to delay HW tailptr update, which also reduces the cost per
> packet.
> 
> One restriction of the new API is that every SKB must belong to the
> same TXQ.  This patch takes the easy way out, by restricting bulk
> dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
> qdisc only have attached a single TXQ.
> 
> Some detail about the flow; dev_hard_start_xmit() will process the skb
> list, and transmit packets individually towards the driver (see
> xmit_one()).  In case the driver stops midway in the list, the
> remaining skb list is returned by dev_hard_start_xmit().  In
> sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
> 
> The patch also tries to limit the amount of bytes dequeued, based on
> the drivers BQL limits.  It also tries to avoid and stop dequeuing
> when seeing a GSO packet (both real GSO and segmented GSO skb lists).
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> 
> ---
> V2:
>  - Restruct functions, split out functionality
>  - Use BQL bytelimit to avoid overshooting driver limits, causing
>    too large skb lists to be sitting on the requeue gso_skb.
> 
>  net/sched/sch_generic.c |   67 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 64 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 19696eb..a0c8070 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -56,6 +56,67 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>  	return 0;
>  }
>  
> +static inline bool qdisc_may_bulk(const struct Qdisc *qdisc,
> +				  const struct sk_buff *skb)

Why skb is passed here ?

> +{
> +	return (qdisc->flags & TCQ_F_ONETXQUEUE);
	
	return qdisc->flags & TCQ_F_ONETXQUEUE;

> +}
> +
> +static inline struct sk_buff *qdisc_dequeue_validate(struct Qdisc *qdisc)
> +{
> +	struct sk_buff *skb = qdisc->dequeue(qdisc);
> +
> +	if (skb != NULL)
> +		skb = validate_xmit_skb(skb, qdisc_dev(qdisc));

> +
> +	return skb;
> +}
> +
> +static inline struct sk_buff *qdisc_bulk_dequeue_skb(struct Qdisc *q,
> +						     struct sk_buff *head)
> +{
> +	struct sk_buff *new, *skb = head;
> +	struct netdev_queue *txq = q->dev_queue;
> +	int bytelimit = netdev_tx_avail_queue(txq);
> +	int limit = 5;
> +
> +	if (bytelimit <= 0)
> +		return head;
> +
> +	do {
> +		if (skb->next || skb_is_gso(skb)) {

A GSO packet should not 'stop the processing' if the device is TSO
capable : It should look as a normal packet.

> +			/* Stop processing if the skb is already a skb
> +			 * list (e.g a segmented GSO packet) or a real
> +			 * GSO packet */
> +			break;
> +		}
> +		new = qdisc_dequeue_validate(q);

Thats broken.

After validate_xmit_skb()  @new might be the head of a list. (GSO split)



> +		if (new) {
> +			skb->next = new;


> +			skb = new;

and new is only the first element on the list.

> +			bytelimit -= skb->len;

skb->len is the length of first segment.

> +			cnt++;
> +			/* One problem here is it is difficult to
> +			 * requeue the "new" dequeued skb, e.g. in
> +			 * case of GSO, thus a "normal" packet can
> +			 * have a GSO packet on its ->next ptr.
> +			 *
> +			 * Requeue is difficult because if requeuing
> +			 * on q->gso_skb, then a second requeue can
> +			 * happen from sch_direct_xmit e.g. if driver
> +			 * returns NETDEV_TX_BUSY, which would
> +			 * overwrite this requeue.
> +			 */

It should not overwrite, but insert back. Thats what need to be done.

> +		}
> +	} while (new && --limit && (bytelimit > 0));
> +	skb = head;
> +
> +	return skb;
> +}
> +


Idea is really the following :

Once qdisc dequeue bulk is done, and skb validated, we have a list of
skb to send to the device.

We iterate the list and try to send the individual skb.

As soon as device returns NETDEV_TX_BUSY (or even better, when the queue
is stopped), we requeue the remaining list.

BQL never tried to find the exact split point :

If available budget is 25000 bytes, and next TSO packet is 45000 bytes,
we send it.

So the bql validation should be done before the eventual
validate_xmit_skb() : This way you dont care of GSO or TSO.





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ