[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEA6p_ArQdNp=hQCjrsnAo-Xy22d44b=2KdLp7zO7E7XDA4Fog@mail.gmail.com>
Date: Tue, 8 Dec 2020 13:51:03 -0800
From: Wei Wang <weiwan@...gle.com>
To: stranche@...eaurora.org
Cc: Eric Dumazet <eric.dumazet@...il.com>,
David Ahern <dsahern@...il.com>,
Martin KaFai Lau <kafai@...com>,
Mahesh Bandewar <maheshb@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Subash Abhinov Kasiviswanathan <subashab@...eaurora.org>
Subject: Re: Refcount mismatch when unregistering netdevice from kernel
On Tue, Dec 8, 2020 at 11:13 AM <stranche@...eaurora.org> wrote:
>
> Hi Wei and Eric,
>
> Thanks for the replies.
>
> This was reported to us on the 5.4.61 kernel during a customer
> regression suite, so we don't have an exact reproducer unfortunately.
> From the trace logs we've added it seems like this is happening during
> IPv6 transport mode XFRM data transfer and the device is unregistered in
> the middle of it, but we've been unable to reproduce it ourselves..
> We're open to trying out and sharing debug patches if needed though.
>
I double checked 5.4.61, and I didn't find any missing fixes in this
area AFAICT.
> > rt6_uncached_list_flush_dev() actually tries to replace the inet6_dev
> > with loopback_dev, and release the reference to the previous inet6_dev
> > by calling in6_dev_put(), which is actually doing the same thing as
> > ip6_dst_ifdown(). I don't understand why you say " a reference to the
> > inet6_dev is simply dropped".
>
> Fair. I was going off the semantics used by the dst_dev_put() function
> which calls dst_ops->ifdown() explicitly. At least in the case of
> xfrm6_dst_ifdown() this swap of the loopback device and putting the
> refcount seems like it could be missing a few things.
>
Looking more into the xfrm code, I think the major difference between
xfrm dst and non-xfrm dst is that, xfrm code creates a list of dst
entries in one dst_entry linked by xfrm_dst_child().
In xfrm_bundle_create(), which I believe is the main function to
create xfrm dst bundles, it allocates the main dst entry and its
children, and it calls xfrm_fill_dst() for each of them. So I think
each dst in the list (including all the children) are added into the
uncached_list.
The difference between the current code in
rt6_uncached_list_flush_dev() vs dst_ops->ifdown() is that, the
current code only releases the refcnt to inet6_dev associated with the
main dst, while xfrm6_dst_ifdown() tries to release inet6_dev
associated with every dst linked by xfrm_dst_child(). However, since
xfrm_bundle_create() anyway adds each child dst to the uncached list,
I don't see how the current code could miss any refcnt.
BTW, have you tried your previous proposed patch and confirmed it
would fix the issue?
> > The additional refcount to the DST is also released by doing the
> > following:
> > if (rt_dev == dev) {
> > rt->dst.dev = blackhole_netdev;
> > dev_hold(rt->dst.dev);
> > dev_put(rt_dev);
> > }
> > Am I missing something?
>
> That dev_put() is on the actual netdevice struct, not the inet6_dev
> associated with it. We're seeing many calls to icmp6_dst_alloc() and
> xfrm6_fill_dst() here, both of which seem to associate a reference to
> the inet6_dev struct with the DST in addition to the standard dev_hold()
> on the netdevice during the dst_alloc()/dst_init().
>
Could we further distinguish between dst added to the uncached list by
icmp6_dst_alloc() and xfrm6_fill_dst(), and confirm which ones are the
ones leaking reference?
I suspect it would be the xfrm ones, but I think it is worth verifying.
> Thanks,
> Sean
Powered by blists - more mailing lists