[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180206124202.4c86ddc9@jimi>
Date: Tue, 6 Feb 2018 12:42:02 +0200
From: Eyal Birger <eyal.birger@...il.com>
To: Steffen Klassert <steffen.klassert@...unet.com>
Cc: <netdev@...r.kernel.org>, <herbert@...dor.apana.org.au>,
<davem@...emloft.net>, <shmulik@...anetworks.com>,
Wei Wang <weiwan@...gle.com>, fw@...len.de
Subject: Re: xfrm, ip tunnel: non released device reference upon device
unregistration
Hi Steffen,
On Tue, 6 Feb 2018 09:53:38 +0100
Steffen Klassert <steffen.klassert@...unet.com> wrote:
> Cc Wei Wang
>
> On Sun, Feb 04, 2018 at 01:21:18PM +0200, Eyal Birger wrote:
> > Hi,
> >
> > We've encountered a non released device reference upon device
> > unregistration which seems to stem from xfrm policy code.
> >
> > The setup includes:
> > - an underlay device (e.g. eth0) using IPv4
> > - an xfrm IPv6 over IPv4 tunnel routed via the underlay device
> > - an ipip6 tunnel over the xfrm IPv6 tunnel
> >
> > When tearing down the underlay device, after traffic had passed via
> > the ipip6 tunnel, log messages of the following form are observed:
> >
> > unregister_netdevice: waiting for eth0 to become free. Usage count
> > = 2
>
> Looks like this happened when the dst garbage collection code was
> removed. I could not point to a commit that introduced it so I
> did a bisection and this pointed to:
>
> commit 9514528d92d4cbe086499322370155ed69f5d06c
> ipv6: call dst_dev_put() properly
>
> With this commit we leak the one refcount and some further commit
> leaked the second one.
>
> >
> > The below synthetic script reproduces this consistently on a fresh
> > ubuntu vm running net-next v4.15-6066-ge9522a5:
> > ---------------------------------------------------------
> > #!/bin/bash
> >
> > ipsec_underlay_dst=192.168.6.1
> > ipsec_underlay_src=192.168.5.2
> > ipv6_pfx=1234
> > local_ipv6_addr="$ipv6_pfx::1"
> > remote_ipv6_addr="$ipv6_pfx::2"
> >
> > # create dummy ipsec underlay
> > ip l add dev dummy1 type dummy
> > ip l set dev dummy1 up
> > ip r add "$ipsec_underlay_dst/32" dev dummy1
> > ip -6 r add "$ipv6_pfx::/16" dev dummy1
> >
> > ip a add dev dummy1 "$local_ipv6_addr/128"
> > ip a add dev dummy1 "$ipsec_underlay_src/24"
> >
> > # add xfrm policy and state
> > ip x p add src "$local_ipv6_addr/128" dst "$ipv6_pfx::/16" dir out
> > tmpl src "$ipsec_underlay_src" dst "$ipsec_underlay_dst" proto esp
> > reqid 1 mode tunnel ip x s add src "$ipsec_underlay_src" dst
> > "$ipsec_underlay_dst" proto esp spi 0xcd440ce6 reqid 1 mode tunnel
> > auth-trunc 'hmac(sha1)' 0x34a546d309031628962b814ef073aff1a638ad21
> > 96 enc 'cbc(aes)' 0xf31e14149c328297fe7925ad7448420e encap espinudp
> > 4500 4500 0.0.0.0
> >
> > # add 4o6 tunnel
> > ip l add tnl46 type ip6tnl mode ipip6 local "$local_ipv6_addr"
> > remote "$remote_ipv6_addr" ip l set dev tnl46 up
> > ip r add 10.64.0.0/10 dev tnl46
> >
> > # pass traffic so route is cached
> > ping -w 1 -c 1 10.64.0.1
> >
> > # remove dummy underlay
> > ip l del dummy1
> > ---------------------------------------------------------
> >
> > Analysis:
> >
> > ip6_tunnel holds a dst_cache which caches its underlay dst objects.
> > When devices are unregistered, non-xfrm dst objects are invlidated
> > by their original creators (ipv4/ipv6/...) and thus are wiped from
> > dst_cache.
> >
> > xfrm created routes otoh are not tracked by xfrm, and are not
> > invalidated upon device unregistration, thus hold the device upon
> > unregistration.
> >
> > The following rough sketch patch illustrates an approach overcoming
> > this issue:
> > ---------------------------------------------------------
[snip]
> > ---------------------------------------------------------
> >
> > This approach has the unfortunate side effects of adding a spin
> > lock for the tracked list, as well as increasing struct xfrm_dst.
>
> Reintroducing garbage collection is probably not a so good idea. I
> think the patch below should fix it a bit less intrusive.
>
>
> Subject: [PATCH RFC] xfrm: Fix netdev refcount leak when flushing the
> percpu dst cache.
>
> The dst garbage collection code is removed, so we need to call
> dst_dev_put() on cached dst entries before we release them.
> Otherwise we leak the refcount to the netdev.
>
> Signed-off-by: Steffen Klassert <steffen.klassert@...unet.com>
> ---
> net/xfrm/xfrm_policy.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 7a23078132cf..7836b7601b49 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -1715,8 +1715,10 @@ static int xfrm_expand_policies(const struct
> flowi *fl, u16 family, static void xfrm_last_dst_update(struct
> xfrm_dst *xdst, struct xfrm_dst *old) {
> this_cpu_write(xfrm_last_dst, xdst);
> - if (old)
> + if (old) {
> + dst_dev_put(&old->u.dst);
> dst_release(&old->u.dst);
> + }
> }
>
> static void __xfrm_pcpu_work_fn(void)
> @@ -1787,6 +1789,7 @@ void xfrm_policy_cache_flush(void)
> old = per_cpu(xfrm_last_dst, cpu);
> if (old && !xfrm_bundle_ok(old)) {
> per_cpu(xfrm_last_dst, cpu) = NULL;
> + dst_dev_put(&old->u.dst);
> dst_release(&old->u.dst);
> }
> rcu_read_unlock();
I have tested this and indeed it prevents the leak.
But... IIUC the xfrm_last_dst cache is a single instance that is updated
every time a new bundle is created, whereas ip6_tunnel uses a different
dst_cache for each tunnel.
Invalidating the dst every time a new bundle is created effectively means
that in a multiple tunnels scenario (multiple ip6_tunnels over multiple
xfrm policies) there is only one active ip6_tunnel dst_cache at a time.
In case multiple tunnels are used at the same times, I think this
essentially renders the ip6_tunnel dst_cache useless.
Eyal.
Powered by blists - more mailing lists