[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87wnucktw1.fsf@vcostago-mobl2.amr.corp.intel.com>
Date: Fri, 12 Mar 2021 12:18:54 -0800
From: Vinicius Costa Gomes <vinicius.gomes@...el.com>
To: Kurt Kanzenbach <kurt@...utronix.de>
Cc: Vladimir Oltean <olteanv@...il.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
Kurt Kanzenbach <kurt@...utronix.de>
Subject: Re: [PATCH RFC net-next] taprio: Handle short intervals and large
packets
Kurt Kanzenbach <kurt@...utronix.de> writes:
> When using short intervals e.g. below one millisecond, large packets won't be
> transmitted at all. The software implementations checks whether the packet can
> be fit into the remaining interval. Therefore, it takes the packet length and
> the transmission speed into account. That is correct.
>
> However, for large packets it may be that the transmission time will be larger
> than the interval resulting in no packet transmission. The same situation works
> fine with hardware offloading applied.
>
> The problem has been observerd with the following schedule and iperf3:
>
> |tc qdisc replace dev lan1 parent root handle 100 taprio \
> | num_tc 8 \
> | map 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 \
> | queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> | base-time $base \
> | sched-entry S 0x40 500000 \
> | sched-entry S 0xbf 500000 \
> | clockid CLOCK_TAI \
> | flags 0x00
>
> [...]
>
> |root@tsn:~# iperf3 -c 192.168.2.105
> |Connecting to host 192.168.2.105, port 5201
> |[ 5] local 192.168.2.121 port 52610 connected to 192.168.2.105 port 5201
> |[ ID] Interval Transfer Bitrate Retr Cwnd
> |[ 5] 0.00-1.00 sec 45.2 KBytes 370 Kbits/sec 0 1.41 KBytes
> |[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
>
> After debugging, it seems that the packet length stored in the SKB is about
> 7000-8000 bytes. Using a 100 Mbit/s link the transmission time is about 600us
> which larger than the interval of 500us.
>
> Therefore, segment the SKB into smaller chunks if the packet is too big. This
> yields similar results than the hardware offload:
>
> |root@tsn:~# iperf3 -c 192.168.2.105
> |Connecting to host 192.168.2.105, port 5201
> |- - - - - - - - - - - - - - - - - - - - - - - - -
> |[ ID] Interval Transfer Bitrate Retr
> |[ 5] 0.00-10.00 sec 48.9 MBytes 41.0 Mbits/sec 0 sender
> |[ 5] 0.00-10.02 sec 48.7 MBytes 40.7 Mbits/sec receiver
>
> Signed-off-by: Kurt Kanzenbach <kurt@...utronix.de>
> ---
> net/sched/sch_taprio.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 8287894541e3..8434e87f79f7 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -411,6 +411,42 @@ static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
> return txtime;
> }
>
> +/* Similar to net/sched/sch_tbf.c::tbf_segment */
> +static int taprio_segment(struct sk_buff *skb, struct Qdisc *sch,
> + struct Qdisc *child, struct sk_buff **to_free)
> +{
> + netdev_features_t features = netif_skb_features(skb);
> + unsigned int len = 0, prev_len = qdisc_pkt_len(skb);
> + struct sk_buff *segs, *nskb;
> + int ret, nb;
> +
> + segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
> +
> + if (IS_ERR_OR_NULL(segs))
> + return qdisc_drop(skb, sch, to_free);
> +
> + nb = 0;
> + skb_list_walk_safe(segs, segs, nskb) {
> + skb_mark_not_on_list(segs);
> + qdisc_skb_cb(segs)->pkt_len = segs->len;
> + len += segs->len;
> + ret = qdisc_enqueue(segs, child, to_free);
> + if (ret != NET_XMIT_SUCCESS) {
> + if (net_xmit_drop_count(ret))
> + qdisc_qstats_drop(sch);
> + } else {
> + nb++;
> + }
> + }
> +
> + sch->q.qlen += nb;
> + if (nb > 1)
> + qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
> + consume_skb(skb);
> +
> + return nb > 0 ? NET_XMIT_SUCCESS : NET_XMIT_DROP;
> +}
> +
> static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> struct sk_buff **to_free)
> {
> @@ -433,6 +469,9 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> return qdisc_drop(skb, sch, to_free);
> }
>
> + if (skb_is_gso(skb))
> + return taprio_segment(skb, sch, child, to_free);
> +
My first worry was whether the segments had the same tstamp as their
parent, and it seems that they do, so everything should just work with
etf or the txtime-assisted mode.
I just want to play with this patch a bit and see how it works in
practice. But it looks good.
Cheers,
--
Vinicius
Powered by blists - more mailing lists