[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d8a0069a-b387-c470-8599-d892e4a35881@gmail.com>
Date: Sat, 7 Mar 2020 17:52:10 -0700
From: David Ahern <dsahern@...il.com>
To: Alarig Le Lay <alarig@...rdarmor.fr>, netdev@...r.kernel.org,
jack@...ilfillan.uk, Vincent Bernat <bernat@...ian.org>
Subject: Re: IPv6 regression introduced by commit
3b6761d18bc11f2af2a6fc494e9026d39593f22c
On 3/5/20 1:17 AM, Alarig Le Lay wrote:
> Hi,
>
> On the bird users ML, we discussed a bug we’re facing when having a
> full table: from time to time all the IPv6 traffic is dropped (and all
> neighbors are invalidated), after a while it comes back again, then wait
> a few minutes and it’s dropped again, and so on.
Kernel version?
you are monitoring neighbor states with 'ip monitor' or something else?
>
> Basil Fillan determined that it comes from the commit
> 3b6761d18bc11f2af2a6fc494e9026d39593f22c.
>
...
> We've also experienced this after upgrading a few routers to Debian Buster.
> With a kernel bisect we found that a bug was introduced in the following
> commit:
>
> 3b6761d18bc11f2af2a6fc494e9026d39593f22c
>
> This bug was still present in master as of a few weeks ago.
>
> It appears entries are added to the IPv6 route cache which aren't visible from
> "ip -6 route show cache", but are causing the route cache garbage collection
> system to trigger extremely often (every packet?) once it exceeds the value of
> net.ipv6.route.max_size. Our original symptom was extreme forwarding jitter
> caused within the ip6_dst_gc function (identified by some spelunking with
> systemtap & perf) worsening as the size of the cache increased. This was due
> to our max_size sysctl inadvertently being set to 1 million. Reducing this
> value to the default 4096 broke IPv6 forwarding entirely on our test system
> under affected kernels. Our documentation had this sysctl marked as the
> maximum number of IPv6 routes, so it looks like the use changed at some point.
>
> We've rolled our routers back to kernel 4.9 (with the sysctl set to 4096) for
> now, which fixed our immediate issue.
>
> You can reproduce this by adding more than 4096 (default value of the sysctl)
> routes to the kernel and running "ip route get" for each of them. Once the
> route cache is filled, the error "RTNETLINK answers: Network is unreachable"
> will be received for each subsequent "ip route get" incantation, and v6
> connectivity will be interrupted.
>
The above does not reproduce for me on 5.6 or 4.19, and I would have
been really surprised if it had, so I have to question the git bisect
result.
There is no limit on fib entries, and the number of FIB entries has no
impact on the sysctl in question, net.ipv6.route.max_size. That sysctl
limits the number of dst_entry instances. When the threshold is exceeded
(and the gc_thesh for ipv6 defaults to 1024), each new alloc attempts to
free one via gc. There are many legitimate reasons for why 4k entries
have been created - mtu exceptions, redirects, per-cpu caching, vrfs, ...
In 4.9 FIB entries are created as an rt6_info which is a v6 wrapper
around dst_entry. That changed in 4.15 or 4.16 - I forget which now, and
the commit you reference above is part of the refactoring to make IPv6
more like IPv4 with a different, smaller data structure for fib entries.
A lot of other changes have also gone into IPv6 between 4.9 and top of
tree, and at this point the whole gc thing can probably go away for v6
like it was removed for ipv4.
Try the 5.4 LTS and see if you still hit a problem.
Powered by blists - more mailing lists