netdev - Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200623215627.dczzidomidhur4ra@lion.mk-sys.cz>
Date:   Tue, 23 Jun 2020 23:56:27 +0200
From:   Michal Kubecek <mkubecek@...e.cz>
To:     Oliver Herms <oliver.peter.herms@...il.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, kuznet@....inr.ac.ru,
        yoshfuji@...ux-ipv6.org, kuba@...nel.org
Subject: Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC

On Tue, Jun 23, 2020 at 12:46:34AM +0200, Oliver Herms wrote:
> On 22.06.20 23:44, Michal Kubecek wrote:
> > On Mon, Jun 22, 2020 at 10:53:55PM +0200, Oliver Herms wrote:
> >> When fib6_run_gc is called with parameter force=true the spinlock in
> >> /net/ipv6/ip6_fib.c:2310 can lock all CPUs in softirq when
> >> net.ipv6.route.max_size is exceeded (seen this multiple times).
> >> One sotirq/CPU get's the lock. All others spin to get it. It takes
> >> substantial time until all are done. Effectively it's a DOS vector.
> >>
> >> As the splinlock is only enforcing that there is at most one GC running
> >> at a time, it should IMHO be safe to use force=false here resulting
> >> in spin_trylock_bh instead of spin_lock_bh, thus avoiding the lock
> >> contention.
> >>
> >> Finding a locked spinlock means some GC is going on already so it is
> >> save to just skip another execution of the GC.
> >>
> >> Signed-off-by: Oliver Herms <oliver.peter.herms@...il.com>
> > 
> > I wonder if it wouldn't suffice to revert commit 14956643550f ("ipv6:
> > slight optimization in ip6_dst_gc") as the reasoning in its commit
> > message seems wrong: we do not always skip fib6_run_gc() when
> > entries <= rt_max_size, we do so only if the time since last garbage
> > collector run is shorter than rt_min_interval.
> > 
> > Then you would prevent the "thundering herd" effect when only gc_thresh
> > is exceeded but not max_size, as commit 2ac3ac8f86f2 ("ipv6: prevent
> > fib6_run_gc() contention") intended, but would still preserve enforced
> > garbage collect when max_size is exceeded.
> > 
> > Michal
> > 
> 
> Hi Michal,
> 
> I did some testing with packets causing 17k IPv6 route cache entries per 
> second. With "entries > rt_max_size" all CPUs of the system get stuck 
> waiting for the spinlock. With "false" CPU load stays at <<10% on every single
> CPU core (tested on an Atom C2750). This makes sense as "entries > rt_max_size"
> would not prevent multiple CPUs from trying to get the lock.
> 
> So reverting 14956643550f is not enough.

The problem I see with this kind of test is that you simulate a scenario
where you are routinely using more entries than the cache is sized for.
In other words, if this happened in a real life setup, the actual
problem would be too low settings for gc_thresh and max_size. Also, your
patch would minimize the difference between gc_thresh (where we want to
get) and max_size (hard limit) by making the hard limit "softer".

Michal