netdev - Re: [PATCH net-next] Reduce IP_FRAG_TIME fragment-reassembly timeout to 1s, from 30s

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKJDUQuXBueuZWdi17LgFW3yb4LUsH3hzY08+ytJ9QgeA@mail.gmail.com>
Date:   Fri, 30 Apr 2021 19:09:34 +0200
From:   Eric Dumazet <edumazet@...gle.com>
To:     Matt Corallo <netdev-list@...tcorallo.com>
Cc:     Willy Tarreau <w@....eu>, "David S. Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Keyu Man <kman001@....edu>
Subject: Re: [PATCH net-next] Reduce IP_FRAG_TIME fragment-reassembly timeout
 to 1s, from 30s

On Fri, Apr 30, 2021 at 5:52 PM Matt Corallo
<netdev-list@...tcorallo.com> wrote:
>
> Following up - is there a way forward here?
>

Tune the sysctls to meet your goals ?

I did the needed work so that you can absolutely decide to use 256GB
of ram per host for frags if you want.
(Although I have not tested with crazy values like that, maybe some
kind of bottleneck will be hit)

> I think the current ease of hitting the black-hole-ing behavior is unacceptable (and often not something that can be
> changed even with the sysctl knobs due to intermediate hosts), and am happy to do some work to fix it.
>
> Someone mentioned in a previous thread randomly evicting fragments instead of dropping all new fragments when we reach
> saturation, which may be an option. We could also do something in between 1s and 30s, preserving behavior for hosts
> which see fragments delivered out-of-order by seconds while still reducing the ease of accidentally just black-holing
> all fragments entirely in more standard internet access deployments.
>

Give me one implementation, I will give you a DDOS program to defeat it.
linux code is public, attackers will simply change their attacks.

There is no generic solution, they are all bad.

If you evict randomly, it will also fail. So why bother ?


> >
> >
> > On 4/28/21 11:38, Eric Dumazet wrote:
> >> On Wed, Apr 28, 2021 at 4:28 PM Matt Corallo
> >> <netdev-list@...tcorallo.com> wrote:
> >> I have been working in wifi environments (linux conferences) where RTT
> >> could reach 20 sec, and even 30 seconds, and this was in some very
> >> rich cities in the USA.
> >>
> >> Obviously, when a network is under provisioned by 50x factor, you
> >> _need_ more time to complete fragments.
> >
> > Its also a trade-off - if you're in a hugely under-provisioned environment with bufferblot issues you may have some
> > fragments that need more time for reassembly if they've gotten horribly reordered (though just having 20 second RTT
> > doesn't imply that fragments are going to be re-ordered by 20 seconds, more likely you might see a small fraction of
> > it), but you're also likely to have more *lost* fragments, which can trigger the black-holing behavior here.
> >
> > If you have some loss in the flow, its very easy to hit 1Mbps of lost fragments and suddenly instead of just giving more
> > time to reassemble, you're just black-holing instead. I'm not claiming I have the right trade-off here, I'd love more
> > input, but at least in my experience trying to just occasionally send fragments on a pretty standard DOCSIS modem, 30s
> > is way off.
> >
> >> For some reason, the crazy IP reassembly stuff comes every couple of
> >> years, and it is now a FAQ.
> >>
> >> The Internet has changed for the  lucky ones, but some deployments are
> >> using 4Mbps satellite connectivity, shared by hundreds of people.
> >
> > I'd think this is a great example of a case where you precisely *dont* want such a low threshold for dropping all
> > fragments. Note that in my specific deployment (standard DOCSIS), we're talking about the same speed and network as was
> > available ten years ago, this isn't exactly an uncommon or particularly fancy deployment. The real issue is applications
> > which happily send 8MB of fragments within a few seconds and suddenly find themselves completely black-holed for 30
> > seconds, but this isn't going to just go away.
> >
> >> I urge application designers to _not_ rely on doomed frags, even in
> >> controlled networks.
> >
> > I'd love to, but we're talking about a default value for fragment reassembly. At least in my subjective experience here,
> > the conservative 30s time takes things from "more time" to "completely blackhole", which feels like the wrong tradeoff.
> > At the end of the day, you can't expect fragments to work super well, indeed, and you assume some amount of loss, the
> > goal is to minimize the loss you see from them.
> >
> > Even if you have some reordering, you're unlikely to see every fragment reordered (I guess you could imagine a horribly
> > broken qdisc, does such a thing exist in practice?) such that you always need 30s to reassemble. Taking some loss to
> > avoid making it so easy to completely blackhole fragments seems like a reasonable tradeoff.
> >
> > Matt