lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 27 Jan 2011 11:01:51 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	netdev@...r.kernel.org
Subject: Re: [RFC PATCH] net: Implement read-only protection and COW'ing of
 metrics.

Le mercredi 26 janvier 2011 à 15:25 -0800, David Miller a écrit :
> Eric, thanks again for your feedback.  I've taken a stab at fixing the
> various races, in particular the one you discovered about metrics
> sharing and how this interacts with fib_info releases.
> 
> What I've choosen to do is two-fold:
> 
> 1) Update ->_metrics atomically with cmpxchg once a route becomes publicly
>    visible.
> 
> 2) Remember and grab a reference to the fib_info for shared read-only
>    metrics in rt->fi, then release it once the metrics regerence goes
>    away.
> 
> It sounds expensive but hear me out :-)
> 
> First of all, at rt_set_nexthop() time, the atomic we use to grab a
> ref to the fib_info is replacing a 60-byte memcpy() into the dst
> metrics.
> 
> Next, the ->_metrics atomic to un-COW the metrics at destroy time
> might in fact be overkill.  Especially once writable metrics live in
> the inetpeer cache (that's the next set of patches after this one).
> 
> Finally, once this change is stabilized we can be a lot smarter about
> what we do at the time an entry is created.  For example, when a route
> is looked up for a TCP socket, we essentially know we are going to COW
> the route %99.99999 of the time.  So we can pass a hint into TCP's
> route lookups in the flow flags field telling it to pre-COW the route.
> 
> TCP pre-COW'ing of metrics will thus save several atomics.
> 
> Anyways, here is the patch, it is only build tested at this point, but
> I wanted to get feedback from you about the basic gist of things
> as soon as possible.
> 
> Thanks!
> 

Thanks David, I read this (I am a bit busy preparing my travel to
Reunion/Maurice islands). This looks pretty nice. I have one comment :

>  
> +u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old)
> +{
> +	u32 *p = kmalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC);
> +
> +	if (p) {
> +		u32 *old_p = __DST_METRICS_PTR(old);
> +		unsigned long prev, new;
> +
> +		memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
> +
> +		new = (unsigned long) p;
> +		prev = cmpxchg(&dst->_metrics, old, new);
> +
> +		if (prev != old) {
> +			kfree(p);
> +			p = __DST_METRICS_PTR(prev);
> +			if (prev & DST_METRICS_READ_ONLY)
> +				p = NULL;
> +		}
> +	}
> +	return p;
> +}
> +EXPORT_SYMBOL(dst_cow_metrics_generic);
> +
...

> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 3e5b7cc..7fc6301 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -152,6 +152,36 @@ static void ipv4_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
>  {
>  }
>  
> +static u32 *ipv4_cow_metrics(struct dst_entry *dst, unsigned long old)
> +{
> +	u32 *p = kmalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC);
> +
> +	if (p) {
> +		u32 *old_p = __DST_METRICS_PTR(old);
> +		unsigned long prev, new;
> +
> +		memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
> +
> +		new = (unsigned long) p;
> +		prev = cmpxchg(&dst->_metrics, old, new);
> +
> +		if (prev != old) {
> +			kfree(p);
> +			p = __DST_METRICS_PTR(prev);
> +			if (prev & DST_METRICS_READ_ONLY)
> +				p = NULL;
> +		} else {

Hmm, I first asked myself why you dont use dst_cow_metrics_generic() to
perform the generic allocation, but saw following :

> +			struct rtable *rt = (struct rtable *) dst;
> +

Since you use cmpxchg() to permut the dst->_metrics, I feel this rt->fi
needs some protection as well. Maybe store fi pointer inside the metrics
instead of dst, or else you need a spinlock to perform the whole
transaction (change dst->_metrics & rt->fi) ?

> +			if (rt->fi) {
> +				fib_info_put(rt->fi);
> +				rt->fi = NULL;
> +			}
> +		}
> +	}
> +	return p;
> +}
> +


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ