netdev - Re: Expensive tcp_collapse with high tcp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CA+wXwBSGsBjovTqvoPQEe012yEF2eYbnC5_0W==EAvWH1zbOAg@mail.gmail.com>
Date:   Thu, 20 Jan 2022 17:29:52 +0000
From:   Daniel Dao <dqminh@...udflare.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     netdev <netdev@...r.kernel.org>,
        kernel-team <kernel-team@...udflare.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Marek Majkowski <marek@...udflare.com>
Subject: Re: Expensive tcp_collapse with high tcp_rmem limit

On Thu, Jan 6, 2022 at 6:55 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Thu, Jan 6, 2022 at 10:52 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> > I think that you should first look if you are under some kind of attack [1]
> >
> > Eventually you would still have to make room, involving expensive copies.
> >
> > 12% of 16MB is still a lot of memory to copy.
> >
> > [1] Detecting an attack signature could allow you to zap the socket
> > and save ~16MB of memory per flow.

Sorry for the late reply, we spent more time over the past weeks to
gather more data.

>   tid 0: rmem_alloc=16780416 sk_rcvbuf=16777216 rcv_ssthresh=2920
>   tid 0: advmss=1460 wclamp=4194304 rcv_wnd=450560
>   tid 0: len=3316 truesize=15808
>   tid 0: len=4106 truesize=16640
>   tid 0: len=3967 truesize=16512
>   tid 0: len=2988 truesize=15488
> > I think that you should first look if you are under some kind of attack [1]

This and indeed the majority of similar occurrences come from a
websocket origin that can
emit a large flow of tiny packets. As the tcp_collapse hiccups occur
in a proxy node, we think that
a combination of slow / unresponsive clients and the websocket traffic
can trigger this.

We made a workaround to clamp the websocket's rcvbuf to a smaller
value and it reduces
the peak latency of tcp_collapse as we no longer need to collapse up to 16MB.

> What kind of NIC driver is used on your host ?

We are running mlx5

> Except that you would still have to parse the linear list.

Most of the time when we see a high value of tcp_collapse, the bloated
skb is almost always at the top
of the list. I guess the client is already unresponsive so the flow is
full of bloated skbs. I would rather not
having to spend too much time collapsing these skbs.