[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0cb19f7e-a9b3-58f8-6119-0736010f1326@bluematt.me>
Date: Wed, 28 Apr 2021 10:09:00 -0400
From: Matt Corallo <netdev-list@...tcorallo.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S. Miller" <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Alexey Kuznetsov <kuznet@....inr.ac.ru>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
Willy Tarreau <w@....eu>, Keyu Man <kman001@....edu>
Subject: Re: [PATCH net-next] Reduce IP_FRAG_TIME fragment-reassembly timeout
to 1s, from 30s
On 4/28/21 08:20, Eric Dumazet wrote:
> This is going to break many use cases.
>
> I can certainly say that in many cases, we need more than 1 second to
> complete reassembly.
> Some Internet users share satellite links with 600 ms RTT, not
> everybody has fiber links in 2021.
I'm curious what RTT has to do with it? Frags aren't resent, so there's no RTT you need to wait for, the question is
more your available bandwidth and how much packet reordering you see, which even for many sat links isn't zero anymore
(better, in-flow packet reordering is becoming more and more rare!).
Even given some material reordering, 30 seconds on a 100Kb is a lot!
> There is a sysctl, exactly for the cases where admins can decide to
> make the value smaller.
Sadly this doesn't actually solve it in many cases. Because Linux reassembles fragments by default any time conntrack is
loaded (disabling this is very nontrivial), anyone with a Linux box in between two hosts ends up breaking flows with any
material loss of frags.
More broadly, just because there is a sysctl, doesn't mean the default needs to be sensible for most users. As you note,
there's a sysctl, if someone is on a 1Kbps sat link with fragments sent out of order, they can change it :). This
constant hasn't been touched since pre-git!
> You can laugh all you want, the sad thing with IP frags is that really
> some applications still want to use them.
Yes, including my application, which breaks any time the flow *transits* a Linux box (ie not just my end host(s), but
any box in between that happens to have conntrack loaded).
> Also, admins willing to use 400 MB of memory instead of 4MB can just
> change a sysctl.
>
> Again, nothing will prevent reassembly units to be DDOS targets.
Yep, not claiming any differently. As noted in a previous thread you really have to crank up the limits to prevent DDOS.
> At Google, we use 100 MB for /proc/sys/net/ipv4/ipfrag_high_thresh and
> /proc/sys/net/ipv6/ip6frag_high_thresh,
> no kernel patch is needed.
>
Powered by blists - more mailing lists