netdev - Re: [PATCH net-next] Reduce IP_FRAG_TIME fragment-reassembly timeout to 1s, from 30s

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0cb19f7e-a9b3-58f8-6119-0736010f1326@bluematt.me>
Date:   Wed, 28 Apr 2021 10:09:00 -0400
From:   Matt Corallo <netdev-list@...tcorallo.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Willy Tarreau <w@....eu>, Keyu Man <kman001@....edu>
Subject: Re: [PATCH net-next] Reduce IP_FRAG_TIME fragment-reassembly timeout
 to 1s, from 30s

On 4/28/21 08:20, Eric Dumazet wrote:
> This is going to break many use cases.
> 
> I can certainly say that in many cases, we need more than 1 second to
> complete reassembly.
> Some Internet users share satellite links with 600 ms RTT, not
> everybody has fiber links in 2021.

I'm curious what RTT has to do with it? Frags aren't resent, so there's no RTT you need to wait for, the question is 
more your available bandwidth and how much packet reordering you see, which even for many sat links isn't zero anymore 
(better, in-flow packet reordering is becoming more and more rare!).

Even given some material reordering, 30 seconds on a 100Kb is a lot!

> There is a sysctl, exactly for the cases where admins can decide to
> make the value smaller.

Sadly this doesn't actually solve it in many cases. Because Linux reassembles fragments by default any time conntrack is 
loaded (disabling this is very nontrivial), anyone with a Linux box in between two hosts ends up breaking flows with any 
material loss of frags.

More broadly, just because there is a sysctl, doesn't mean the default needs to be sensible for most users. As you note, 
there's a sysctl, if someone is on a 1Kbps sat link with fragments sent out of order, they can change it :). This 
constant hasn't been touched since pre-git!

> You can laugh all you want, the sad thing with IP frags is that really
> some applications still want to use them.

Yes, including my application, which breaks any time the flow *transits* a Linux box (ie not just my end host(s), but 
any box in between that happens to have conntrack loaded).

> Also, admins willing to use 400 MB of memory instead of 4MB can just
> change a sysctl.
> 
> Again, nothing will prevent reassembly units to be DDOS targets.

Yep, not claiming any differently. As noted in a previous thread you really have to crank up the limits to prevent DDOS.

> At Google, we use 100 MB for /proc/sys/net/ipv4/ipfrag_high_thresh and
> /proc/sys/net/ipv6/ip6frag_high_thresh,
> no kernel patch is needed.
>