lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 6 Feb 2018 12:42:02 +0200
From:   Eyal Birger <eyal.birger@...il.com>
To:     Steffen Klassert <steffen.klassert@...unet.com>
Cc:     <netdev@...r.kernel.org>, <herbert@...dor.apana.org.au>,
        <davem@...emloft.net>, <shmulik@...anetworks.com>,
        Wei Wang <weiwan@...gle.com>, fw@...len.de
Subject: Re: xfrm, ip tunnel: non released device reference upon device
 unregistration

Hi Steffen,

On Tue, 6 Feb 2018 09:53:38 +0100
Steffen Klassert <steffen.klassert@...unet.com> wrote:

> Cc Wei Wang
> 
> On Sun, Feb 04, 2018 at 01:21:18PM +0200, Eyal Birger wrote:
> > Hi,
> > 
> > We've encountered a non released device reference upon device
> > unregistration which seems to stem from xfrm policy code.
> > 
> > The setup includes:
> > - an underlay device (e.g. eth0) using IPv4
> > - an xfrm IPv6 over IPv4 tunnel routed via the underlay device
> > - an ipip6 tunnel over the xfrm IPv6 tunnel
> > 
> > When tearing down the underlay device, after traffic had passed via
> > the ipip6 tunnel, log messages of the following form are observed:
> > 
> > unregister_netdevice: waiting for eth0 to become free. Usage count
> > = 2  
> 
> Looks like this happened when the dst garbage collection code was
> removed. I could not point to a commit that introduced it so I
> did a bisection and this pointed to:
> 
> commit 9514528d92d4cbe086499322370155ed69f5d06c
> ipv6: call dst_dev_put() properly
> 
> With this commit we leak the one refcount and some further commit
> leaked the second one.
> 
> > 
> > The below synthetic script reproduces this consistently on a fresh
> > ubuntu vm running net-next v4.15-6066-ge9522a5:
> > ---------------------------------------------------------
> > #!/bin/bash
> > 
> > ipsec_underlay_dst=192.168.6.1
> > ipsec_underlay_src=192.168.5.2
> > ipv6_pfx=1234
> > local_ipv6_addr="$ipv6_pfx::1"
> > remote_ipv6_addr="$ipv6_pfx::2"
> > 
> > # create dummy ipsec underlay
> > ip l add dev dummy1 type dummy
> > ip l set dev dummy1 up
> > ip r add "$ipsec_underlay_dst/32" dev dummy1
> > ip -6 r add "$ipv6_pfx::/16" dev dummy1
> > 
> > ip a add dev dummy1 "$local_ipv6_addr/128"
> > ip a add dev dummy1 "$ipsec_underlay_src/24"
> > 
> > # add xfrm policy and state
> > ip x p add src "$local_ipv6_addr/128" dst "$ipv6_pfx::/16" dir out
> > tmpl src "$ipsec_underlay_src" dst "$ipsec_underlay_dst" proto esp
> > reqid 1 mode tunnel ip x s add src "$ipsec_underlay_src" dst
> > "$ipsec_underlay_dst" proto esp spi 0xcd440ce6 reqid 1 mode tunnel
> > auth-trunc 'hmac(sha1)' 0x34a546d309031628962b814ef073aff1a638ad21
> > 96 enc 'cbc(aes)' 0xf31e14149c328297fe7925ad7448420e encap espinudp
> > 4500 4500 0.0.0.0
> > 
> > # add 4o6 tunnel
> > ip l add tnl46 type ip6tnl mode ipip6 local "$local_ipv6_addr"
> > remote "$remote_ipv6_addr" ip l set dev tnl46 up
> > ip r add 10.64.0.0/10 dev tnl46 
> > 
> > # pass traffic so route is cached
> > ping -w 1 -c 1 10.64.0.1
> > 
> > # remove dummy underlay
> > ip l del dummy1
> > ---------------------------------------------------------
> > 
> > Analysis:
> > 
> > ip6_tunnel holds a dst_cache which caches its underlay dst objects.
> > When devices are unregistered, non-xfrm dst objects are invlidated
> > by their original creators (ipv4/ipv6/...) and thus are wiped from
> > dst_cache.
> > 
> > xfrm created routes otoh are not tracked by xfrm, and are not
> > invalidated upon device unregistration, thus hold the device upon
> > unregistration.
> > 
> > The following rough sketch patch illustrates an approach overcoming
> > this issue:
> > ---------------------------------------------------------  

[snip]

> > ---------------------------------------------------------
> > 
> > This approach has the unfortunate side effects of adding a spin
> > lock for the tracked list, as well as increasing struct xfrm_dst.  
> 
> Reintroducing garbage collection is probably not a so good idea. I
> think the patch below should fix it a bit less intrusive.
> 
> 
> Subject: [PATCH RFC] xfrm: Fix netdev refcount leak when flushing the
> percpu dst cache.
> 
> The dst garbage collection code is removed, so we need to call
> dst_dev_put() on cached dst entries before we release them.
> Otherwise we leak the refcount to the netdev.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@...unet.com>
> ---
>  net/xfrm/xfrm_policy.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 7a23078132cf..7836b7601b49 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -1715,8 +1715,10 @@ static int xfrm_expand_policies(const struct
> flowi *fl, u16 family, static void xfrm_last_dst_update(struct
> xfrm_dst *xdst, struct xfrm_dst *old) {
>  	this_cpu_write(xfrm_last_dst, xdst);
> -	if (old)
> +	if (old) {
> +		dst_dev_put(&old->u.dst);
>  		dst_release(&old->u.dst);
> +	}
>  }
>  
>  static void __xfrm_pcpu_work_fn(void)
> @@ -1787,6 +1789,7 @@ void xfrm_policy_cache_flush(void)
>  		old = per_cpu(xfrm_last_dst, cpu);
>  		if (old && !xfrm_bundle_ok(old)) {
>  			per_cpu(xfrm_last_dst, cpu) = NULL;
> +			dst_dev_put(&old->u.dst);
>  			dst_release(&old->u.dst);
>  		}
>  		rcu_read_unlock();
I have tested this and indeed it prevents the leak.

But... IIUC the xfrm_last_dst cache is a single instance that is updated
every time a new bundle is created, whereas ip6_tunnel uses a different
dst_cache for each tunnel.

Invalidating the dst every time a new bundle is created effectively means
that in a multiple tunnels scenario (multiple ip6_tunnels over multiple
xfrm policies) there is only one active ip6_tunnel dst_cache at a time.

In case multiple tunnels are used at the same times, I think this
essentially renders the ip6_tunnel dst_cache useless.

Eyal.

Powered by blists - more mailing lists