linux-kernel - Re: [PATCH RFC v1 net-next 1/4] net: Introduce Qdisc backpressure infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220510022350.GA4619@bytedance>
Date:   Mon, 9 May 2022 19:23:50 -0700
From:   Peilin Ye <yepeilin.cs@...il.com>
To:     Dave Taht <dave.taht@...il.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Peilin Ye <peilin.ye@...edance.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Cong Wang <cong.wang@...edance.com>
Subject: Re: [PATCH RFC v1 net-next 1/4] net: Introduce Qdisc backpressure
 infrastructure

Hi Dave,

On Mon, May 09, 2022 at 12:53:28AM -0700, Dave Taht wrote:
> I am very pleased to see this work.

Thanks!

> However,  my "vision" such as it was, and as misguided as it might be,
> was to implement a facility similar to tcp_notsent_lowat for udp
> packets, tracking the progress of the udp packet through the kernel,
> and supplying backpressure and providing better information about
> where when and why the packet was dropped in the stack back to the
> application.

By "a facility similar to tcp_notsent_lowat", do you mean a smaller
sk_sndbuf, or "UDP Small Queues"?

I don't fully understand the implications of using a smaller sk_sndbuf
yet, but I think it can work together with this RFC.

sk_sndbuf is a per-socket attribute, while this RFC tries to improve it
from Qdisc's perspective.  Using a smaller sk_sndbuf alone does not
prevent the "when UDP sends faster, TBF simply drops faster" issue
(described in [I] of the cover letter) from happening.  There's always a
point, that there're too many sockets, so TBF's queue cannot contain
"sk_sndbuf times number of sockets" (roughly speaking) bytes of skbs.
After that point, TBF will suddenly start to drop a lot.

For example, I used the default 212992 sk_sndbuf
(/proc/sys/net/core/wmem_default) in the test setup ([V] in the cover
letter).  Let's make it one tenth as large, 21299.  It works well for
the 2-client setup; zero packets dropped.  However, if we test it with
15 iperf2 clients:

[  3]  0.0-30.0 sec  46.4 MBytes  13.0 Mbits/sec   1.251 ms 89991/123091 (73%)
[  3]  0.0-30.0 sec  46.6 MBytes  13.0 Mbits/sec   2.033 ms 91204/124464 (73%)
[  3]  0.0-30.0 sec  46.5 MBytes  13.0 Mbits/sec   0.504 ms 89879/123054 (73%)
<...>                                                       ^^^^^^^^^^^^ ^^^^^

73% drop rate again.  Now apply this RFC:

[  3]  0.0-30.0 sec  46.3 MBytes  12.9 Mbits/sec   1.206 ms  807/33839 (2.4%)
[  3]  0.0-30.0 sec  45.5 MBytes  12.7 Mbits/sec   1.919 ms  839/33283 (2.5%)
[  3]  0.0-30.0 sec  45.8 MBytes  12.8 Mbits/sec   2.521 ms  837/33508 (2.5%)
<...>                                                        ^^^^^^^^^ ^^^^^^

Down to 3% again.

Next, same 21299 sk_sndbuf, 20 iperf2 clients, without RFC:

[  3]  0.0-30.0 sec  34.5 MBytes  9.66 Mbits/sec   1.054 ms 258703/283342 (91%)
[  3]  0.0-30.0 sec  34.5 MBytes  9.66 Mbits/sec   1.033 ms 257324/281964 (91%)
[  3]  0.0-30.0 sec  34.5 MBytes  9.66 Mbits/sec   1.116 ms 257858/282500 (91%)
<...>                                                       ^^^^^^^^^^^^^ ^^^^^

91% drop rate.  Finally, apply RFC:

[  3]  0.0-30.0 sec  34.4 MBytes  9.61 Mbits/sec   0.974 ms 7982/32503 (25%)
[  3]  0.0-30.0 sec  34.1 MBytes  9.54 Mbits/sec   1.381 ms 7394/31732 (23%)
[  3]  0.0-30.0 sec  34.3 MBytes  9.58 Mbits/sec   2.431 ms 8149/32583 (25%)
<...>                                                       ^^^^^^^^^^ ^^^^^

The thundering herd probelm ([III] in the cover letter) surfaces, but
still an improvement.

In conclusion, assuming we are going to use smaller sk_sndbuf or "UDP
Small Queues", I think it doesn't replace this RFC, and vice versa.

> I've been really impressed by the DROP_REASON work and had had no clue
> prior to seeing all that instrumentation, where else packets might be
> dropped in the kernel.
> 
> I'd be interested to see what happens with sch_cake.

Sure, I will cover sch_cake in v2.

Thanks,
Peilin Ye