[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f25d75823a73c6f0f556f0905f931d1@codeaurora.org>
Date: Mon, 04 Jan 2021 20:05:17 -0700
From: stranche@...eaurora.org
To: David Ahern <dsahern@...il.com>
Cc: Wei Wang <weiwan@...gle.com>,
Eric Dumazet <eric.dumazet@...il.com>,
Martin KaFai Lau <kafai@...com>,
Mahesh Bandewar <maheshb@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Subash Abhinov Kasiviswanathan <subashab@...eaurora.org>
Subject: Re: Refcount mismatch when unregistering netdevice from kernel
On 2020-12-11 09:10, David Ahern wrote:
>>>> Could we further distinguish between dst added to the uncached list
>>>> by
>>>> icmp6_dst_alloc() and xfrm6_fill_dst(), and confirm which ones are
>>>> the
>>>> ones leaking reference?
>>>> I suspect it would be the xfrm ones, but I think it is worth
>>>> verifying.
>>>>
>>
>> After digging into the DST allocation/destroy a bit more, it seems
>> that
>> there are some cases where the DST's refcount does not hit zero,
>> causing
>> them to never be freed and release their references.
>> One case comes from here on the IPv6 packet output path (these DST
>> structs would hold references to both the inet6_dev and the netdevice)
>> ip6_pol_route_output+0x20/0x2c -> ip6_pol_route+0x1dc/0x34c ->
>> rt6_make_pcpu_route+0x18/0xf4 -> ip6_rt_pcpu_alloc+0xb4/0x19c
>
> This is the normal data path, and this refers to a per-cpu dst cache.
> Delete the route and the cached entries get removed.
>
After tracing all the DST entries created by the system, we've been able
to see
that all unfreed DST entries belong to the same route on the system. One
is the
main rt6_info struct it references and the rest are percpu copies of it.
>>
>> We also see two DSTs where they are stored as the xdst->rt entry on
>> the
>> XFRM path that do not get released. One is allocated by the same path
>> as
>> above, and the other like this
>> xfrm6_esp_err+0x7c/0xd4 -> esp6_err+0xc8/0x100 ->
>> ip6_update_pmtu+0xc8/0x100 -> __ip6_rt_update_pmtu+0x248/0x434 ->
>> ip6_rt_cache_alloc+0xa0/0x1dc
>
> This entry goes into an exception cache. I have lost track of kernel
> versions and features. Try listing the route cache to see these: ip -6
> ro ls cache
Thanks for the tip here. We've further seen that the route that refers
to these
unfreed DST is always a cached exception route. After tracing the routes
as well,
we can see that the fib6_info struct for this route is never freed
either, thus
preventing any of the DSTs associated with it from being cleaned up and
releasing
their refcounts on the device. In fact, we can see that the fib6_info
struct is no
longer present in the main fib6 tree after a period of time. The last
time we're
able to see the pointer to the route in the tree is during a route
replace
operation from userspace, but it seems that the fib6_info is not fully
released.
In particular, the exception cache is not flushed out for the route
during the
replace operation like it is during a standard fib6_del_route() call.
We're able to reproduce the refcount mismatch after some experimentation
as well.
Essentially, it consists of
1) adding a default route (ip -6 route add dev XXX default)
2) forcing the creation of an exception route via manually injecting an
ICMPv6
Packet Too Big into the device.
3) Replace the default route (ip -6 route change dev XXX default)
4) Delete the device. (ip link del XXX)
After adding a call to flush out the exception cache for the route, the
mismatch
is no longer seen:
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 7a0c877..95e4310 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1215,6 +1215,7 @@ static int fib6_add_rt2node(struct fib6_node *fn,
struct fib6_info *rt,
}
nsiblings = iter->fib6_nsiblings;
iter->fib6_node = NULL;
+ rt6_flush_exceptions(iter);
fib6_purge_rt(iter, fn, info->nl_net);
if (rcu_access_pointer(fn->rr_ptr) == iter)
fn->rr_ptr = NULL;
Powered by blists - more mailing lists