netdev - Re: [PATCH v2 net-next 14/14] mlx5: support BIG TCP packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89i+d0gaAM=Bsve-ix5BcKnK5gL1MtVhYbBha+92TiFSHpw@mail.gmail.com>
Date:   Sat, 5 Mar 2022 09:57:52 -0800
From:   Eric Dumazet <edumazet@...gle.com>
To:     David Ahern <dsahern@...nel.org>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>, Coco Li <lixiaoyan@...gle.com>,
        Alexander Duyck <alexanderduyck@...com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Leon Romanovsky <leon@...nel.org>
Subject: Re: [PATCH v2 net-next 14/14] mlx5: support BIG TCP packets

On Sat, Mar 5, 2022 at 8:36 AM David Ahern <dsahern@...nel.org> wrote:
>
> On 3/4/22 10:14 AM, Eric Dumazet wrote:
> > On Thu, Mar 3, 2022 at 8:43 PM David Ahern <dsahern@...nel.org> wrote:
> >>
> >> On 3/3/22 11:16 AM, Eric Dumazet wrote:
> >>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> >>> index b2ed2f6d4a9208aebfd17fd0c503cd1e37c39ee1..1e51ce1d74486392a26568852c5068fe9047296d 100644
> >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> >>> @@ -4910,6 +4910,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
> >>>
> >>>       netdev->priv_flags       |= IFF_UNICAST_FLT;
> >>>
> >>> +     netif_set_tso_ipv6_max_size(netdev, 512 * 1024);
> >>
> >>
> >> How does the ConnectX hardware handle fairness for such large packet
> >> sizes? For 1500 MTU this means a single large TSO can cause the H/W to
> >> generate 349 MTU sized packets. Even a 4k MTU means 128 packets. This
> >> has an effect on the rate of packets hitting the next hop switch for
> >> example.
> >
> > I think ConnectX cards interleave packets from all TX queues, at least
> > old CX3 have a parameter to control that.
> >
> > Given that we already can send at line rate, from a single TX queue, I
> > do not see why presenting larger TSO packets
> > would change anything on the wire ?
> >
> > Do you think ConnectX adds an extra gap on the wire at the end of a TSO train ?
>
> It's not about 1 queue, my question was along several lines. e.g,
> 1. the inter-packet gap for TSO generated packets. With 512kB packets
> the burst is 8x from what it is today.

We did experiments with 185 KB  (or 45 4K segments in our case [1]),
and got no increase of drops.
We are deploying these limits.
[1] we increased MAX_SKB_FRAGS to 45,  so that zero copy for both TX
and RX is possible.

Once your switches are 100Gbit rated, just send them 100Gbit traffic.

Note that linux TCP has a lot of burst-control, and pacing features already.

>
> 2. the fairness within hardware as 1 queue has potentially many 512kB
> packets and the impact on other queues (e.g., higher latency?) since it
> will take longer to split the larger packets into MTU sized packets.

It depends on the NIC. Many NICs (including mlx4) have a per queue quantum,
usually configurable in power of two steps (4K, 8K, 16K, 32K ...)

It means that one TSO packet is split in smaller chunks, depending on
concurrent eligible TX queues.

Our NIC of the day at  Google, has a MTU quantum per queue.

(This is one of the reason I added
/sys/class/net/ethX/gro_flush_timeout, because sending TSO packets
would not mean the receiver would receive this TSO in a single train
of received packets)

>
> It is really about understanding the change this new default size is
> going to have on users.

Sure, but to be able to conduct experiments, and allow TCP congestion control
to probe for bigger bursts, we need the core to support bigger packets.

Then, one can precisely tune the max GSO size that it wants, per
ethernet device,
if really existing rate limiting features do not help.