netdev - Re: [PATCH net v1] net: taprio offload: enforce qdisc to netdev queue mapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210514083226.6d3912c4@kicinski-fedora-PC1C0HJN>
Date:   Fri, 14 May 2021 08:32:26 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Yannick Vignon <yannick.vignon@....nxp.com>
Cc:     Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
        Joakim Zhang <qiangqing.zhang@....com>,
        sebastien.laveze@....nxp.com,
        Yannick Vignon <yannick.vignon@....com>,
        Kurt Kanzenbach <kurt@...utronix.de>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        Ivan Khoronzhuk <ivan.khoronzhuk@...aro.org>,
        Vladimir Oltean <olteanv@...il.com>,
        Vedang Patel <vedang.patel@...el.com>,
        Michael Walle <michael@...le.cc>
Subject: Re: [PATCH net v1] net: taprio offload: enforce qdisc to netdev
 queue mapping

On Tue, 11 May 2021 19:18:29 +0200 Yannick Vignon wrote:
> From: Yannick Vignon <yannick.vignon@....com>
> 
> Even though the taprio qdisc is designed for multiqueue devices, all the
> queues still point to the same top-level taprio qdisc. This works and is
> probably required for software taprio, but at least with offload taprio,
> it has an undesirable side effect: because the whole qdisc is run when a
> packet has to be sent, it allows packets in a best-effort class to be
> processed in the context of a task sending higher priority traffic. If
> there are packets left in the qdisc after that first run, the NET_TX
> softirq is raised and gets executed immediately in the same process
> context. As with any other softirq, it runs up to 10 times and for up to
> 2ms, during which the calling process is waiting for the sendmsg call (or
> similar) to return. In my use case, that calling process is a real-time
> task scheduled to send a packet every 2ms, so the long sendmsg calls are
> leading to missed timeslots.
> 
> By attaching each netdev queue to its own qdisc, as it is done with
> the "classic" mq qdisc, each traffic class can be processed independently
> without touching the other classes. A high-priority process can then send
> packets without getting stuck in the sendmsg call anymore.
> 
> Signed-off-by: Yannick Vignon <yannick.vignon@....com>
> ---
> 
> This patch fixes an issue I observed while verifying the behavior of the
> taprio qdisc in a real-time networking situation.
> I am wondering if implementing separate taprio qdiscs for the software
> and accelerated cases wouldn't be a better solution, but that would
> require changes to the iproute2 package as well, and would break
> backwards compatibility.

You haven't CCed anyone who worked on this Qdisc in the last 2 years :/
CCing them now. Comments, anyone?

This looks like a very drastic change. Are you expecting the qdisc will
always be bypassed?

After a 1 minute looks it seems like taprio is using device queues in
strict priority fashion. Maybe a different model is needed, but a qdisc
with:

enqueue()
{
	WARN_ONCE(1)
}

really doesn't look right to me.

Quoting the rest of the patch below for the benefit of those on CC.

> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 5c91df52b8c2..0bfb03052429 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -438,6 +438,11 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  	struct Qdisc *child;
>  	int queue;
>  
> +	if (unlikely(FULL_OFFLOAD_IS_ENABLED(q->flags))) {
> +		WARN_ONCE(1, "Trying to enqueue skb into the root of a taprio qdisc configured with full offload\n");
> +		return qdisc_drop(skb, sch, to_free);
> +	}
> +
>  	queue = skb_get_queue_mapping(skb);
>  
>  	child = q->qdiscs[queue];
> @@ -529,23 +534,7 @@ static struct sk_buff *taprio_peek_soft(struct Qdisc *sch)
>  
>  static struct sk_buff *taprio_peek_offload(struct Qdisc *sch)
>  {
> -	struct taprio_sched *q = qdisc_priv(sch);
> -	struct net_device *dev = qdisc_dev(sch);
> -	struct sk_buff *skb;
> -	int i;
> -
> -	for (i = 0; i < dev->num_tx_queues; i++) {
> -		struct Qdisc *child = q->qdiscs[i];
> -
> -		if (unlikely(!child))
> -			continue;
> -
> -		skb = child->ops->peek(child);
> -		if (!skb)
> -			continue;
> -
> -		return skb;
> -	}
> +	WARN_ONCE(1, "Trying to peek into the root of a taprio qdisc configured with full offload\n");
>  
>  	return NULL;
>  }
> @@ -654,27 +643,7 @@ static struct sk_buff *taprio_dequeue_soft(struct Qdisc *sch)
>  
>  static struct sk_buff *taprio_dequeue_offload(struct Qdisc *sch)
>  {
> -	struct taprio_sched *q = qdisc_priv(sch);
> -	struct net_device *dev = qdisc_dev(sch);
> -	struct sk_buff *skb;
> -	int i;
> -
> -	for (i = 0; i < dev->num_tx_queues; i++) {
> -		struct Qdisc *child = q->qdiscs[i];
> -
> -		if (unlikely(!child))
> -			continue;
> -
> -		skb = child->ops->dequeue(child);
> -		if (unlikely(!skb))
> -			continue;
> -
> -		qdisc_bstats_update(sch, skb);
> -		qdisc_qstats_backlog_dec(sch, skb);
> -		sch->q.qlen--;
> -
> -		return skb;
> -	}
> +	WARN_ONCE(1, "Trying to dequeue from the root of a taprio qdisc configured with full offload\n");
>  
>  	return NULL;
>  }
> @@ -1759,6 +1728,37 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
>  	return taprio_change(sch, opt, extack);
>  }
>  
> +static void taprio_attach(struct Qdisc *sch)
> +{
> +	struct taprio_sched *q = qdisc_priv(sch);
> +	struct net_device *dev = qdisc_dev(sch);
> +	unsigned int ntx;
> +
> +	/* Attach underlying qdisc */
> +	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
> +		struct Qdisc *qdisc = q->qdiscs[ntx];
> +		struct Qdisc *old;
> +
> +		if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
> +			qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
> +			old = dev_graft_qdisc(qdisc->dev_queue, qdisc);
> +			if (ntx < dev->real_num_tx_queues)
> +				qdisc_hash_add(qdisc, false);
> +		} else {
> +			old = dev_graft_qdisc(qdisc->dev_queue, sch);
> +			qdisc_refcount_inc(sch);
> +		}
> +		if (old)
> +			qdisc_put(old);
> +	}
> +
> +	/* access to the child qdiscs is not needed in offload mode */
> +	if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
> +		kfree(q->qdiscs);
> +		q->qdiscs = NULL;
> +	}
> +}
> +
>  static struct netdev_queue *taprio_queue_get(struct Qdisc *sch,
>  					     unsigned long cl)
>  {
> @@ -1785,8 +1785,12 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
>  	if (dev->flags & IFF_UP)
>  		dev_deactivate(dev);
>  
> -	*old = q->qdiscs[cl - 1];
> -	q->qdiscs[cl - 1] = new;
> +	if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
> +		*old = dev_graft_qdisc(dev_queue, new);
> +	} else {
> +		*old = q->qdiscs[cl - 1];
> +		q->qdiscs[cl - 1] = new;
> +	}
>  
>  	if (new)
>  		new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
> @@ -2020,6 +2024,7 @@ static struct Qdisc_ops taprio_qdisc_ops __read_mostly = {
>  	.change		= taprio_change,
>  	.destroy	= taprio_destroy,
>  	.reset		= taprio_reset,
> +	.attach		= taprio_attach,
>  	.peek		= taprio_peek,
>  	.dequeue	= taprio_dequeue,
>  	.enqueue	= taprio_enqueue,