lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 23 Jun 2020 23:56:27 +0200
From:   Michal Kubecek <mkubecek@...e.cz>
To:     Oliver Herms <oliver.peter.herms@...il.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, kuznet@....inr.ac.ru,
        yoshfuji@...ux-ipv6.org, kuba@...nel.org
Subject: Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC

On Tue, Jun 23, 2020 at 12:46:34AM +0200, Oliver Herms wrote:
> On 22.06.20 23:44, Michal Kubecek wrote:
> > On Mon, Jun 22, 2020 at 10:53:55PM +0200, Oliver Herms wrote:
> >> When fib6_run_gc is called with parameter force=true the spinlock in
> >> /net/ipv6/ip6_fib.c:2310 can lock all CPUs in softirq when
> >> net.ipv6.route.max_size is exceeded (seen this multiple times).
> >> One sotirq/CPU get's the lock. All others spin to get it. It takes
> >> substantial time until all are done. Effectively it's a DOS vector.
> >>
> >> As the splinlock is only enforcing that there is at most one GC running
> >> at a time, it should IMHO be safe to use force=false here resulting
> >> in spin_trylock_bh instead of spin_lock_bh, thus avoiding the lock
> >> contention.
> >>
> >> Finding a locked spinlock means some GC is going on already so it is
> >> save to just skip another execution of the GC.
> >>
> >> Signed-off-by: Oliver Herms <oliver.peter.herms@...il.com>
> > 
> > I wonder if it wouldn't suffice to revert commit 14956643550f ("ipv6:
> > slight optimization in ip6_dst_gc") as the reasoning in its commit
> > message seems wrong: we do not always skip fib6_run_gc() when
> > entries <= rt_max_size, we do so only if the time since last garbage
> > collector run is shorter than rt_min_interval.
> > 
> > Then you would prevent the "thundering herd" effect when only gc_thresh
> > is exceeded but not max_size, as commit 2ac3ac8f86f2 ("ipv6: prevent
> > fib6_run_gc() contention") intended, but would still preserve enforced
> > garbage collect when max_size is exceeded.
> > 
> > Michal
> > 
> 
> Hi Michal,
> 
> I did some testing with packets causing 17k IPv6 route cache entries per 
> second. With "entries > rt_max_size" all CPUs of the system get stuck 
> waiting for the spinlock. With "false" CPU load stays at <<10% on every single
> CPU core (tested on an Atom C2750). This makes sense as "entries > rt_max_size"
> would not prevent multiple CPUs from trying to get the lock.
> 
> So reverting 14956643550f is not enough.

The problem I see with this kind of test is that you simulate a scenario
where you are routinely using more entries than the cache is sized for.
In other words, if this happened in a real life setup, the actual
problem would be too low settings for gc_thresh and max_size. Also, your
patch would minimize the difference between gc_thresh (where we want to
get) and max_size (hard limit) by making the hard limit "softer".

Michal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ