lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1387402729.19078.340.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Wed, 18 Dec 2013 13:38:49 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Tom Herbert <therbert@...gle.com>
Cc:	davem@...emloft.net, netdev@...r.kernel.org
Subject: Re: [PATCH 1/2 v2] net: Cache dst in tunnels

On Wed, 2013-12-18 at 12:06 -0800, Tom Herbert wrote:
> Avoid doing a route lookup on every packet being tunneled.
> 
> In ip_tunnel.c cache the route returned from ip_route_output if
> the tunnel is "connected" so that all the rouitng parameters are
> taken from tunnel parms for a packet. Specifically, not NBMA tunnel
> and tos is from tunnel parms (not inner packet).
> 

It seems title suffix should be "ipv4", not "net" ?

> Signed-off-by: Tom Herbert <therbert@...gle.com>
> ---
>  include/net/ip_tunnels.h |   3 ++
>  net/ipv4/ip_tunnel.c     | 110 ++++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 89 insertions(+), 24 deletions(-)
> 
> diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> index 732f8c6..bde50fc 100644
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -54,6 +54,9 @@ struct ip_tunnel {
>  	int		hlen;		/* Precalculated header length */
>  	int		mlink;
>  
> +	struct		dst_entry __rcu *dst_cache;
> +	spinlock_t	dst_lock;
> +
>  	struct ip_tunnel_parm parms;
>  
>  	/* for SIT */
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index 90ff957..f9ffe38 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -68,6 +68,51 @@ static unsigned int ip_tunnel_hash(struct ip_tunnel_net *itn,
>  			 IP_TNL_HASH_BITS);
>  }
>  
> +static inline void __tunnel_dst_set(struct ip_tunnel *t, struct dst_entry *dst)
> +{
> +	struct dst_entry *old_dst;
> +
> +	spin_lock_bh(&t->dst_lock);
> +	old_dst = rcu_dereference_raw(t->dst_cache);
> +	rcu_assign_pointer(t->dst_cache, dst);
> +	dst_release(old_dst);
> +	spin_unlock_bh(&t->dst_lock);
> +}
> +

You could use xchg() like in commit
e47eb5dfb296bf21
    "udp: ipv4: do not use sk_dst_lock from softirq context"

Also, it would be nice to make sure DST_NOCACHE is not set in dst flags,
otherwise dst_release() wont respect RCU grace period.

See __skb_dst_set_noref() for details.

It might be possible to trigger this using a multicast address.

Note: Its possible we could get rid of DST_NOCACHE if we deploy enough
caches obsoleting the rcu issue, but thats a separate discussion.


> +static inline void tunnel_dst_set(struct ip_tunnel *t, struct dst_entry *dst)
> +{
> +	__tunnel_dst_set(t, dst);
> +}
> +
...
>  static int ip_tunnel_bind_dev(struct net_device *dev)
> @@ -350,18 +393,18 @@ static int ip_tunnel_bind_dev(struct net_device *dev)
>  		struct flowi4 fl4;
>  		struct rtable *rt;
>  
> -		rt = ip_route_output_tunnel(tunnel->net, &fl4,
> -					    tunnel->parms.iph.protocol,
> -					    iph->daddr, iph->saddr,
> -					    tunnel->parms.o_key,
> -					    RT_TOS(iph->tos),
> -					    tunnel->parms.link);
> +		init_tunnel_flow(&fl4, iph->protocol, iph->daddr,
> +				 iph->saddr, tunnel->parms.o_key,
>   +				 RT_TOS(iph->tos), tunnel->parms.link);
> +		rt = ip_route_output_key(tunnel->net, &fl4);
> +
>  		if (!IS_ERR(rt)) {
>  			tdev = rt->dst.dev;
>  			ip_rt_put(rt);
>  		}
>  		if (dev->type != ARPHRD_ETHER)
>  			dev->flags |= IFF_POINTOPOINT;

Here, it seems we can have IS_ERR(rt), so dst_clone() will crash

> +		tunnel_dst_set(tunnel, dst_clone(&rt->dst));
>  	}
>  



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ