[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180206103203.GD15427@breakpoint.cc>
Date: Tue, 6 Feb 2018 11:32:03 +0100
From: Florian Westphal <fw@...len.de>
To: Steffen Klassert <steffen.klassert@...unet.com>
Cc: Eyal Birger <eyal.birger@...il.com>, netdev@...r.kernel.org,
herbert@...dor.apana.org.au, davem@...emloft.net,
shmulik@...anetworks.com, Wei Wang <weiwan@...gle.com>
Subject: Re: xfrm, ip tunnel: non released device reference upon device
unregistration
Steffen Klassert <steffen.klassert@...unet.com> wrote:
> Cc Wei Wang
>
> On Sun, Feb 04, 2018 at 01:21:18PM +0200, Eyal Birger wrote:
> > Hi,
> >
> > We've encountered a non released device reference upon device
> > unregistration which seems to stem from xfrm policy code.
> >
> > The setup includes:
> > - an underlay device (e.g. eth0) using IPv4
> > - an xfrm IPv6 over IPv4 tunnel routed via the underlay device
> > - an ipip6 tunnel over the xfrm IPv6 tunnel
> >
> > When tearing down the underlay device, after traffic had passed via the ipip6
> > tunnel, log messages of the following form are observed:
> >
> > unregister_netdevice: waiting for eth0 to become free. Usage count = 2
>
> Looks like this happened when the dst garbage collection code was
> removed. I could not point to a commit that introduced it so I
> did a bisection and this pointed to:
>
> commit 9514528d92d4cbe086499322370155ed69f5d06c
> ipv6: call dst_dev_put() properly
>
> With this commit we leak the one refcount and some further commit
> leaked the second one.
> Subject: [PATCH RFC] xfrm: Fix netdev refcount leak when flushing the percpu dst cache.
>
> The dst garbage collection code is removed, so we need to call
> dst_dev_put() on cached dst entries before we release them.
> Otherwise we leak the refcount to the netdev.
I don't think this is related to the xfrm pcpu cache at all.
AFAIU any xfrm dst that gets cached in a tunnel dst cache will
hold the device reference.
Perhaps its best to add a device notifier to the tunnel code
and put device refcount there.
I'll try to come up with a patch *unless* I'm wrong and this is
really just because of xfrm pcpu cache.
Powered by blists - more mailing lists