[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <839f0ad6-83c1-1df6-c34d-b844c52ba771@gmail.com>
Date: Fri, 11 Dec 2020 09:10:26 -0700
From: David Ahern <dsahern@...il.com>
To: stranche@...eaurora.org
Cc: Wei Wang <weiwan@...gle.com>,
Eric Dumazet <eric.dumazet@...il.com>,
Martin KaFai Lau <kafai@...com>,
Mahesh Bandewar <maheshb@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Subash Abhinov Kasiviswanathan <subashab@...eaurora.org>
Subject: Re: Refcount mismatch when unregistering netdevice from kernel
On 12/10/20 6:12 PM, stranche@...eaurora.org wrote:
>>> BTW, have you tried your previous proposed patch and confirmed it
>>> would fix the issue?
>>>
>
> Yes, we shared this with the customer and the refcount mismatch still
> occurred, so this doesn't seem sufficient either.
>
>>> Could we further distinguish between dst added to the uncached list by
>>> icmp6_dst_alloc() and xfrm6_fill_dst(), and confirm which ones are the
>>> ones leaking reference?
>>> I suspect it would be the xfrm ones, but I think it is worth verifying.
>>>
>
> After digging into the DST allocation/destroy a bit more, it seems that
> there are some cases where the DST's refcount does not hit zero, causing
> them to never be freed and release their references.
> One case comes from here on the IPv6 packet output path (these DST
> structs would hold references to both the inet6_dev and the netdevice)
> ip6_pol_route_output+0x20/0x2c -> ip6_pol_route+0x1dc/0x34c ->
> rt6_make_pcpu_route+0x18/0xf4 -> ip6_rt_pcpu_alloc+0xb4/0x19c
This is the normal data path, and this refers to a per-cpu dst cache.
Delete the route and the cached entries get removed.
>
> We also see two DSTs where they are stored as the xdst->rt entry on the
> XFRM path that do not get released. One is allocated by the same path as
> above, and the other like this
> xfrm6_esp_err+0x7c/0xd4 -> esp6_err+0xc8/0x100 ->
> ip6_update_pmtu+0xc8/0x100 -> __ip6_rt_update_pmtu+0x248/0x434 ->
> ip6_rt_cache_alloc+0xa0/0x1dc
This entry goes into an exception cache. I have lost track of kernel
versions and features. Try listing the route cache to see these: ip -6
ro ls cache
Powered by blists - more mailing lists