linux-kernel - Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKt=3iDZM+vUbCvO_aGuedXFhzdC6OtQMeVTMDxyp9bAg@mail.gmail.com>
Date:   Thu, 4 Jun 2020 06:10:43 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Jason Xing <kerneljasonxing@...il.com>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Neal Cardwell <ncardwell@...gle.com>,
        David Miller <davem@...emloft.net>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, liweishi@...ishou.com,
        Shujin Li <lishujin@...ishou.com>
Subject: Re: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode

On Thu, Jun 4, 2020 at 2:01 AM <kerneljasonxing@...il.com> wrote:
>
> From: Jason Xing <kerneljasonxing@...il.com>
>
> When using BBR mode, too many tcp socks cannot be released because of
> duplicate use of the sock_hold() in the manner of tcp_internal_pacing()
> when RTO happens. Therefore, this situation maddly increases the slab
> memory and then constantly triggers the OOM until crash.
>
> Besides, in addition to BBR mode, if some mode applies pacing function,
> it could trigger what we've discussed above,
>
> Reproduce procedure:
> 0) cat /proc/slabinfo | grep TCP
> 1) switch net.ipv4.tcp_congestion_control to bbr
> 2) using wrk tool something like that to send packages
> 3) using tc to increase the delay and loss to simulate the RTO case.
> 4) cat /proc/slabinfo | grep TCP
> 5) kill the wrk command and observe the number of objects and slabs in
> TCP.
> 6) at last, you could notice that the number would not decrease.
>
> v2: extend the timer which could cover all those related potential risks
> (suggested by Eric Dumazet and Neal Cardwell)
>
> Signed-off-by: Jason Xing <kerneljasonxing@...il.com>
> Signed-off-by: liweishi <liweishi@...ishou.com>
> Signed-off-by: Shujin Li <lishujin@...ishou.com>

That is not how things work really.

I will submit this properly so that stable teams do not have to guess
how to backport this to various kernels.

Changelog is misleading, this has nothing to do with BBR, we need to be precise.

Thank you.