lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 27 Sep 2020 17:35:52 +0200
From:   Baptiste Jonglez <baptiste@...sofnetworks.org>
To:     David Ahern <dsahern@...il.com>
Cc:     Alarig Le Lay <alarig@...rdarmor.fr>, netdev@...r.kernel.org,
        jack@...ilfillan.uk, Vincent Bernat <bernat@...ian.org>,
        Oliver <bird-o@...net.de>
Subject: Re: IPv6 regression introduced by commit
 3b6761d18bc11f2af2a6fc494e9026d39593f22c

Hi,

We are seeing the same issue, more information below.

On 07-03-20, David Ahern wrote:
> On 3/5/20 1:17 AM, Alarig Le Lay wrote:
> > Hi,
> > 
> > On the bird users ML, we discussed a bug we’re facing when having a
> > full table: from time to time all the IPv6 traffic is dropped (and all
> > neighbors are invalidated), after a while it comes back again, then wait
> > a few minutes and it’s dropped again, and so on.
> 
> Kernel version?

We are seeing the issue with 4.19 (debian stable) and 5.4 (debian
stable backports from a few months ago).  Others reported still seeing
the issue with 5.7:

  http://trubka.network.cz/pipermail/bird-users/2020-September/014877.html
  http://trubka.network.cz/pipermail/bird-users/2020-September/014881.html


Interestingly, the issue manifests itself in several different ways:

1) failing IPv6 neighbours, what Alarig reported.  We are seeing this
   on a full-view BGP router with rather low amount of IPv6 traffic
   (around 10-20 Mbps)


2) high jitter when forwarding IPv6 traffic: this was in the original
   report from Basil and also here: http://trubka.network.cz/pipermail/bird-users/2020-September/014877.html


3) system lockup: the system becomes unresponsive, with messages like:

     watchdog: BUG: soft lockup - CPU#X stuck for XXs!

   and messages about transmit timeouts from the NIC driver.

   This happened to us on a router that has a BGP full view and
   handles around 50-100 Mbps of IPv6 traffic, which probably means
   lots of route lookups.  It happened with both 4.19 and 5.4.  On the
   other hand, kernel 4.9 runs fine on that exact same router (we are
   running debian buster with the old kernel from debian stretch).


When we can't use an older kernel, our current workaround is the
following sysctl config:

    net.ipv6.route.gc_thresh = 100000
    net.ipv6.route.max_size = 400000

From my understanding, this works because it basically disables the gc
in most cases.

However, the "fib_rt_alloc" field from /proc/net/rt6_stats (6th field)
is steadily increasing: after 2 days of uptime it's at 67k.  At some
point it will hit the gc threshold, we'll see what happens.

I am also trying to reproduce the issue locally.

Thanks,
Baptiste
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ