netdev - Re: [PATCH v4 net-next 00/14] tcp: BIG TCP implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+9eAOwMdUiKa6r9vJJFDkGP97FO1HVZPMKyCpq3CJHtw@mail.gmail.com>
Date:   Tue, 15 Mar 2022 08:50:12 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Alexander H Duyck <alexander.duyck@...il.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        Alexander Duyck <alexanderduyck@...com>,
        Coco Li <lixiaoyan@...gle.com>
Subject: Re: [PATCH v4 net-next 00/14] tcp: BIG TCP implementation

On Fri, Mar 11, 2022 at 9:13 AM Alexander H Duyck
<alexander.duyck@...il.com> wrote:
>
> On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@...gle.com>
> >
> > This series implements BIG TCP as presented in netdev 0x15:
> >
> > https://netdevconf.info/0x15/session.html?BIG-TCP
> >
> > Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/
> >
> > Standard TSO/GRO packet limit is 64KB
> >
> > With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.
> >
> > Note that this feature is by default not enabled, because it might
> > break some eBPF programs assuming TCP header immediately follows IPv6 header.
> >
> > While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
> > are unable to skip over IPv6 extension headers.
> >
> > Reducing number of packets traversing networking stack usually improves
> > performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.
> >
> > 'Standard' performance with current (74KB) limits.
> > for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> > 77           138          183          8542.19
> > 79           143          178          8215.28
> > 70           117          164          9543.39
> > 80           144          176          8183.71
> > 78           126          155          9108.47
> > 80           146          184          8115.19
> > 71           113          165          9510.96
> > 74           113          164          9518.74
> > 79           137          178          8575.04
> > 73           111          171          9561.73
> >
> > Now enable BIG TCP on both hosts.
> >
> > ip link set dev eth0 gro_ipv6_max_size 185000 gso_ipv6_max_size 185000
> > for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> > 57           83           117          13871.38
> > 64           118          155          11432.94
> > 65           116          148          11507.62
> > 60           105          136          12645.15
> > 60           103          135          12760.34
> > 60           102          134          12832.64
> > 62           109          132          10877.68
> > 58           82           115          14052.93
> > 57           83           124          14212.58
> > 57           82           119          14196.01
> >
> > We see an increase of transactions per second, and lower latencies as well.
> >
> > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y in mlx5 (Jakub)
> >
> > v3: Fixed a typo in RFC number (Alexander)
> >     Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.
> >
> > v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
> >     Addressed feedback, for Alexander and nvidia folks.
>
> One concern with this patch set is the addition of all the max_size
> netdev attributes for tsov6, gsov6, and grov6. For the gsov6 and grov6
> maxes I really think these make more sense as sysctl values since it
> feels more like a protocol change rather than a netdev specific one.
>
> If I recall correctly the addition of gso_max_size and gso_max_segs
> were added as a workaround for NICs that couldn't handle offloading
> frames larger than a certain size. This feels like increasing the scope
> of the workaround rather than adding a new feature.
>
> I didn't see the patch that went by for gro_max_size but I am not a fan
> of the way it was added since it would make more sense as a sysctl
> which controlled the stack instead of something that is device specific
> since as far as the device is concerned it received MTU size frames,
> and GRO happens above the device. I suppose it makes things symmetric
> with gso_max_size, but at the same time it isn't really a device
> specific attribute since the work happens in the stack above the
> device.
>

We already have per device gso_max_size and gso_max_segs.

GRO max size being per device is nice. There are cases where a host
has multiple NIC,
one of them being used for incoming traffic that needs to be forwarded.

Maybe the changelog was not clear enough, but being able to lower gro_max_size
is also a way to prevent frag_list being used, so that most NIC
support TSO just fine.


> Do we need to add the IPv6 specific version of the tso_ipv6_max_size?
> Could we instead just allow setting the gso_max_size value larger than
> 64K? Then it would just be a matter of having a protocol specific max
> size check to pull us back down to GSO_MAX_SIZE in the case of non-ipv6
> frames.

Not sure why adding attributes is an issue really, more flexibility
seems better to me.

One day, if someone adds LSOv2 to IPv4, I prefer being able to
selectively turn on this support,
after tests have concluded nothing broke.

Having to turn off LSOv2 in emergency because of some bug in LSOv2
IPv4 implementation would be bad.