lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f25d75823a73c6f0f556f0905f931d1@codeaurora.org>
Date:   Mon, 04 Jan 2021 20:05:17 -0700
From:   stranche@...eaurora.org
To:     David Ahern <dsahern@...il.com>
Cc:     Wei Wang <weiwan@...gle.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Martin KaFai Lau <kafai@...com>,
        Mahesh Bandewar <maheshb@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Subash Abhinov Kasiviswanathan <subashab@...eaurora.org>
Subject: Re: Refcount mismatch when unregistering netdevice from kernel

On 2020-12-11 09:10, David Ahern wrote:

>>>> Could we further distinguish between dst added to the uncached list 
>>>> by
>>>> icmp6_dst_alloc() and xfrm6_fill_dst(), and confirm which ones are 
>>>> the
>>>> ones leaking reference?
>>>> I suspect it would be the xfrm ones, but I think it is worth 
>>>> verifying.
>>>> 
>> 
>> After digging into the DST allocation/destroy a bit more, it seems 
>> that
>> there are some cases where the DST's refcount does not hit zero, 
>> causing
>> them to never be freed and release their references.
>> One case comes from here on the IPv6 packet output path (these DST
>> structs would hold references to both the inet6_dev and the netdevice)
>> ip6_pol_route_output+0x20/0x2c -> ip6_pol_route+0x1dc/0x34c ->
>> rt6_make_pcpu_route+0x18/0xf4 -> ip6_rt_pcpu_alloc+0xb4/0x19c
> 
> This is the normal data path, and this refers to a per-cpu dst cache.
> Delete the route and the cached entries get removed.
> 

After tracing all the DST entries created by the system, we've been able 
to see
that all unfreed DST entries belong to the same route on the system. One 
is the
main rt6_info struct it references and the rest are percpu copies of it.

>> 
>> We also see two DSTs where they are stored as the xdst->rt entry on 
>> the
>> XFRM path that do not get released. One is allocated by the same path 
>> as
>> above, and the other like this
>> xfrm6_esp_err+0x7c/0xd4 -> esp6_err+0xc8/0x100 ->
>> ip6_update_pmtu+0xc8/0x100 -> __ip6_rt_update_pmtu+0x248/0x434 ->
>> ip6_rt_cache_alloc+0xa0/0x1dc
> 
> This entry goes into an exception cache. I have lost track of kernel
> versions and features. Try listing the route cache to see these:  ip -6
> ro ls cache

Thanks for the tip here. We've further seen that the route that refers 
to these
unfreed DST is always a cached exception route. After tracing the routes 
as well,
we can see that the fib6_info struct for this route is never freed 
either, thus
preventing any of the DSTs associated with it from being cleaned up and 
releasing
their refcounts on the device. In fact, we can see that the fib6_info 
struct is no
longer present in the main fib6 tree after a period of time. The last 
time we're
able to see the pointer to the route in the tree is during a route 
replace
operation from userspace, but it seems that the fib6_info is not fully 
released.
In particular, the exception cache is not flushed out for the route 
during the
replace operation like it is during a standard fib6_del_route() call.

We're able to reproduce the refcount mismatch after some experimentation 
as well.
Essentially, it consists of
1) adding a default route (ip -6 route add dev XXX default)
2) forcing the creation of an exception route via manually injecting an 
ICMPv6
Packet Too Big into the device.
3) Replace the default route (ip -6 route change dev XXX default)
4) Delete the device. (ip link del XXX)

After adding a call to flush out the exception cache for the route, the 
mismatch
is no longer seen:
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 7a0c877..95e4310 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1215,6 +1215,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, 
struct fib6_info *rt,
                 }
                 nsiblings = iter->fib6_nsiblings;
                 iter->fib6_node = NULL;
+               rt6_flush_exceptions(iter);
                 fib6_purge_rt(iter, fn, info->nl_net);
                 if (rcu_access_pointer(fn->rr_ptr) == iter)
                         fn->rr_ptr = NULL;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ