[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c2349ca8-acd2-a6af-6bc4-11b0486dbe37@gmail.com>
Date: Mon, 28 Sep 2020 20:39:44 -0700
From: David Ahern <dsahern@...il.com>
To: Baptiste Jonglez <baptiste@...sofnetworks.org>
Cc: Alarig Le Lay <alarig@...rdarmor.fr>, netdev@...r.kernel.org,
jack@...ilfillan.uk, Vincent Bernat <bernat@...ian.org>,
Oliver <bird-o@...net.de>
Subject: Re: IPv6 regression introduced by commit
3b6761d18bc11f2af2a6fc494e9026d39593f22c
On 9/27/20 11:48 PM, Baptiste Jonglez wrote:
> On 27-09-20, David Ahern wrote:
>> On 9/27/20 9:10 AM, Baptiste Jonglez wrote:
>>> On 27-09-20, Baptiste Jonglez wrote:
>>>> 1) failing IPv6 neighbours, what Alarig reported. We are seeing this
>>>> on a full-view BGP router with rather low amount of IPv6 traffic
>>>> (around 10-20 Mbps)
>>>
>>> Ok, I found a quick way to reproduce this issue:
>>>
>>> # for net in {1..9999}; do ip -6 route add 2001:db8:ffff:${net}::/64 via fe80::4242 dev lo; done
>>>
>>> and then:
>>>
>>> # for net in {1..9999}; do ping -c1 2001:db8:ffff:${net}::1; done
>>>
>>> This quickly gets to a situation where ping fails early with:
>>>
>>> ping: connect: Network is unreachable
>>>
>>> At this point, IPv6 connectivity is broken. The kernel is no longer
>>> replying to IPv6 neighbor solicitation from other hosts on local
>>> networks.
>>>
>>> When this happens, the "fib_rt_alloc" field from /proc/net/rt6_stats
>>> is roughly equal to net.ipv6.route.max_size (a bit more in my tests).
>>>
>>> Interestingly, the system appears to stay in this broken state
>>> indefinitely, even without trying to send new IPv6 traffic. The
>>> fib_rt_alloc statistics does not decrease.
>>>
>>
>> fib_rt_alloc is incremented by calls to ip6_dst_alloc. Each of your
>> 9,999 pings is to a unique address and hence causes a dst to be
>> allocated and the counter to be incremented. It is never decremented.
>> That is standard operating procedure.
>
> Ok, then this is a change in behaviour. Here is a graph of fib_rt_alloc
> on a busy router (IPv6 full view, moderate IPv6 traffic) with 4.9 kernel:
>
> https://files.polyno.me/tmp/rt6_stats_fib_rt_alloc_4.9.png
>
> It varies quite a lot and stays around 50, so clearly it can be
> decremented in regular operation.
>
> On 4.19 and later, it does seem to be decremented only when a route is
> removed (ip -6 route delete). Here is the same graph on a router with a
> 4.19 kernel and a large net.ipv6.route.max_size:
>
> https://files.polyno.me/tmp/rt6_stats_fib_rt_alloc_4.19.png
>
> Overall, do you mean that fib_rt_alloc is a red herring and is not a good
> marker of the issue?
>
$ git checkout v4.9
$ egrep -r fib_rt_alloc include/ net/
include//net/ip6_fib.h: __u32 fib_rt_alloc; /* permanent routes */
net//ipv6/route.c: net->ipv6.rt6_stats->fib_rt_alloc,
The first declares it; the second prints it. That's it, no other users
so I have no idea why it shows any changes in your v4.9 graph.
Looking git history shows that Wei actually wired up the stats with
commit 81eb8447daae3b62247aa66bb17b82f8fef68249
Author: Wei Wang <weiwan@...gle.com>
Date: Fri Oct 6 12:06:11 2017 -0700
ipv6: take care of rt6_stats
That patch adds an inc but no dec for this stat which is what you show
in your 4.19 graph
Coming back to the bigger problem, fib_rt_alloc has *no* bearing on the
ability to create dst entries which is what the max_route_size sysctl
affects (not FIB entries which are now unbounded, but dst_entry
instances which is when a FIB entry has been hit and used in the
datapath to move packets).
Eric investigated a similar problem recently which resulted in
commit d8882935fcae28bceb5f6f56f09cded8d36d85e6
Author: Eric Dumazet <edumazet@...gle.com>
Date: Fri May 8 07:34:14 2020 -0700
ipv6: use DST_NOCOUNT in ip6_rt_pcpu_alloc()
and I believe is released in v5.8.
Powered by blists - more mailing lists