netdev - Re: [PATCH net-next] Reduce IP_FRAG_TIME fragment-reassembly timeout to 1s, from 30s

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <64829c98-e4eb-6725-0fee-dc3c6681506f@bluematt.me>
Date:   Wed, 28 Apr 2021 12:35:28 -0400
From:   Matt Corallo <netdev-list@...tcorallo.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Willy Tarreau <w@....eu>, "David S. Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Keyu Man <kman001@....edu>
Subject: Re: [PATCH net-next] Reduce IP_FRAG_TIME fragment-reassembly timeout
 to 1s, from 30s

On 4/28/21 11:38, Eric Dumazet wrote:
> On Wed, Apr 28, 2021 at 4:28 PM Matt Corallo
> <netdev-list@...tcorallo.com> wrote:
> I have been working in wifi environments (linux conferences) where RTT
> could reach 20 sec, and even 30 seconds, and this was in some very
> rich cities in the USA.
> 
> Obviously, when a network is under provisioned by 50x factor, you
> _need_ more time to complete fragments.

Its also a trade-off - if you're in a hugely under-provisioned environment with bufferblot issues you may have some 
fragments that need more time for reassembly if they've gotten horribly reordered (though just having 20 second RTT 
doesn't imply that fragments are going to be re-ordered by 20 seconds, more likely you might see a small fraction of 
it), but you're also likely to have more *lost* fragments, which can trigger the black-holing behavior here.

If you have some loss in the flow, its very easy to hit 1Mbps of lost fragments and suddenly instead of just giving more 
time to reassemble, you're just black-holing instead. I'm not claiming I have the right trade-off here, I'd love more 
input, but at least in my experience trying to just occasionally send fragments on a pretty standard DOCSIS modem, 30s 
is way off.

> For some reason, the crazy IP reassembly stuff comes every couple of
> years, and it is now a FAQ.
> 
> The Internet has changed for the  lucky ones, but some deployments are
> using 4Mbps satellite connectivity, shared by hundreds of people.

I'd think this is a great example of a case where you precisely *dont* want such a low threshold for dropping all 
fragments. Note that in my specific deployment (standard DOCSIS), we're talking about the same speed and network as was 
available ten years ago, this isn't exactly an uncommon or particularly fancy deployment. The real issue is applications 
which happily send 8MB of fragments within a few seconds and suddenly find themselves completely black-holed for 30 
seconds, but this isn't going to just go away.

> I urge application designers to _not_ rely on doomed frags, even in
> controlled networks.

I'd love to, but we're talking about a default value for fragment reassembly. At least in my subjective experience here, 
the conservative 30s time takes things from "more time" to "completely blackhole", which feels like the wrong tradeoff. 
At the end of the day, you can't expect fragments to work super well, indeed, and you assume some amount of loss, the 
goal is to minimize the loss you see from them.

Even if you have some reordering, you're unlikely to see every fragment reordered (I guess you could imagine a horribly 
broken qdisc, does such a thing exist in practice?) such that you always need 30s to reassemble. Taking some loss to 
avoid making it so easy to completely blackhole fragments seems like a reasonable tradeoff.

Matt