netdev - Re: Something hitting my total number of connections to the server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1503423863.2499.39.camel@edumazet-glaptop3.roam.corp.google.com>
Date:   Tue, 22 Aug 2017 10:44:23 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     David Ahern <dsahern@...il.com>
Cc:     Akshat Kakkar <akshat.1984@...il.com>,
        David Laight <David.Laight@...lab.com>,
        netdev <netdev@...r.kernel.org>,
        Willem de Bruijn <willemb@...gle.com>
Subject: Re: Something hitting my total number of connections to the server

On Tue, 2017-08-22 at 09:43 -0700, David Ahern wrote:
> On 8/22/17 6:02 AM, Eric Dumazet wrote:
> >>
> >> net.core.netdev_max_backlog=10000
> > This is an insane backlog.
> > 
> 
> https://www.kernel.org/doc/Documentation/networking/scaling.txt
> 
> "== Suggested Configuration
> 
> Flow limit is useful on systems with many concurrent connections,
> where a single connection taking up 50% of a CPU indicates a problem.
> In such environments, enable the feature on all CPUs that handle
> network rx interrupts (as set in /proc/irq/N/smp_affinity).
> 
> The feature depends on the input packet queue length to exceed
> the flow limit threshold (50%) + the flow history length (256).
> Setting net.core.netdev_max_backlog to either 1000 or 10000
> performed well in experiments."

10000 is adding tail latencies.

At Google we run all the fleet with backlog of 1000

And yes, it took time to get rid of the backlog of 10000 that was setup
years ago, because of old constraints and some fears.

Willem wrote this doc in 2013, before we finally went back to 1000.

We should update this doc.