lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 26 Mar 2019 17:15:44 +0000
From:   Dmitry Safonov <>
To:     David Ahern <>,
Cc:     Alexander Duyck <>,
        Alexey Kuznetsov <>,
        "David S. Miller" <>,
        Eric Dumazet <>,
        Hideaki YOSHIFUJI <>,
        Ido Schimmel <>,
Subject: Re: [RFC 4/4] net/ipv4/fib: Don't synchronise_rcu() every 512Kb

Hi David,

On 3/26/19 3:39 PM, David Ahern wrote:
> On 3/26/19 9:30 AM, Dmitry Safonov wrote:
>> Fib trie has a hard-coded sync_pages limit to call synchronise_rcu().
>> The limit is 128 pages or 512Kb (considering common case with 4Kb
>> pages).
>> Unfortunately, at Arista we have use-scenarios with full view software
>> forwarding. At the scale of 100K and more routes even on 2 core boxes
>> the hard-coded limit starts actively shooting in the leg: lockup
>> detector notices that rtnl_lock is held for seconds.
>> First reason is previously broken MAX_WORK, that didn't limit pending
>> balancing work. While fixing it, I've noticed that the bottle-neck is
>> actually in the number of synchronise_rcu() calls.
>> I've tried to fix it with a patch to decrement number of tnodes in rcu
>> callback, but it hasn't much affected performance.
>> One possible way to "fix" it - provide another sysctl to control
>> sync_pages, but in my POV it's nasty - exposing another realisation
>> detail into user-space.
> well, that was accepted last week. ;-)
> commit 9ab948a91b2c2abc8e82845c0e61f4b1683e3a4f
> Author: David Ahern <>
> Date:   Wed Mar 20 09:18:59 2019 -0700
>     ipv4: Allow amount of dirty memory from fib resizing to be controllable
> Can you see how that change (should backport easily) affects your test
> case? From my perspective 16MB was the sweet spot.

Heh, I based on master, so haven't seen it yet.

I still wonder if it's good to expose it to userspace rather than
shrinker, but this probably should work for me - I'll test it in near days.


Powered by blists - more mailing lists