[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220510022350.GA4619@bytedance>
Date: Mon, 9 May 2022 19:23:50 -0700
From: Peilin Ye <yepeilin.cs@...il.com>
To: Dave Taht <dave.taht@...il.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <eric.dumazet@...il.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
David Ahern <dsahern@...nel.org>,
Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Peilin Ye <peilin.ye@...edance.com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Cong Wang <cong.wang@...edance.com>
Subject: Re: [PATCH RFC v1 net-next 1/4] net: Introduce Qdisc backpressure
infrastructure
Hi Dave,
On Mon, May 09, 2022 at 12:53:28AM -0700, Dave Taht wrote:
> I am very pleased to see this work.
Thanks!
> However, my "vision" such as it was, and as misguided as it might be,
> was to implement a facility similar to tcp_notsent_lowat for udp
> packets, tracking the progress of the udp packet through the kernel,
> and supplying backpressure and providing better information about
> where when and why the packet was dropped in the stack back to the
> application.
By "a facility similar to tcp_notsent_lowat", do you mean a smaller
sk_sndbuf, or "UDP Small Queues"?
I don't fully understand the implications of using a smaller sk_sndbuf
yet, but I think it can work together with this RFC.
sk_sndbuf is a per-socket attribute, while this RFC tries to improve it
from Qdisc's perspective. Using a smaller sk_sndbuf alone does not
prevent the "when UDP sends faster, TBF simply drops faster" issue
(described in [I] of the cover letter) from happening. There's always a
point, that there're too many sockets, so TBF's queue cannot contain
"sk_sndbuf times number of sockets" (roughly speaking) bytes of skbs.
After that point, TBF will suddenly start to drop a lot.
For example, I used the default 212992 sk_sndbuf
(/proc/sys/net/core/wmem_default) in the test setup ([V] in the cover
letter). Let's make it one tenth as large, 21299. It works well for
the 2-client setup; zero packets dropped. However, if we test it with
15 iperf2 clients:
[ 3] 0.0-30.0 sec 46.4 MBytes 13.0 Mbits/sec 1.251 ms 89991/123091 (73%)
[ 3] 0.0-30.0 sec 46.6 MBytes 13.0 Mbits/sec 2.033 ms 91204/124464 (73%)
[ 3] 0.0-30.0 sec 46.5 MBytes 13.0 Mbits/sec 0.504 ms 89879/123054 (73%)
<...> ^^^^^^^^^^^^ ^^^^^
73% drop rate again. Now apply this RFC:
[ 3] 0.0-30.0 sec 46.3 MBytes 12.9 Mbits/sec 1.206 ms 807/33839 (2.4%)
[ 3] 0.0-30.0 sec 45.5 MBytes 12.7 Mbits/sec 1.919 ms 839/33283 (2.5%)
[ 3] 0.0-30.0 sec 45.8 MBytes 12.8 Mbits/sec 2.521 ms 837/33508 (2.5%)
<...> ^^^^^^^^^ ^^^^^^
Down to 3% again.
Next, same 21299 sk_sndbuf, 20 iperf2 clients, without RFC:
[ 3] 0.0-30.0 sec 34.5 MBytes 9.66 Mbits/sec 1.054 ms 258703/283342 (91%)
[ 3] 0.0-30.0 sec 34.5 MBytes 9.66 Mbits/sec 1.033 ms 257324/281964 (91%)
[ 3] 0.0-30.0 sec 34.5 MBytes 9.66 Mbits/sec 1.116 ms 257858/282500 (91%)
<...> ^^^^^^^^^^^^^ ^^^^^
91% drop rate. Finally, apply RFC:
[ 3] 0.0-30.0 sec 34.4 MBytes 9.61 Mbits/sec 0.974 ms 7982/32503 (25%)
[ 3] 0.0-30.0 sec 34.1 MBytes 9.54 Mbits/sec 1.381 ms 7394/31732 (23%)
[ 3] 0.0-30.0 sec 34.3 MBytes 9.58 Mbits/sec 2.431 ms 8149/32583 (25%)
<...> ^^^^^^^^^^ ^^^^^
The thundering herd probelm ([III] in the cover letter) surfaces, but
still an improvement.
In conclusion, assuming we are going to use smaller sk_sndbuf or "UDP
Small Queues", I think it doesn't replace this RFC, and vice versa.
> I've been really impressed by the DROP_REASON work and had had no clue
> prior to seeing all that instrumentation, where else packets might be
> dropped in the kernel.
>
> I'd be interested to see what happens with sch_cake.
Sure, I will cover sch_cake in v2.
Thanks,
Peilin Ye
Powered by blists - more mailing lists