[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1296122511.3027.11.camel@edumazet-laptop>
Date: Thu, 27 Jan 2011 11:01:51 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Miller <davem@...emloft.net>
Cc: netdev@...r.kernel.org
Subject: Re: [RFC PATCH] net: Implement read-only protection and COW'ing of
metrics.
Le mercredi 26 janvier 2011 à 15:25 -0800, David Miller a écrit :
> Eric, thanks again for your feedback. I've taken a stab at fixing the
> various races, in particular the one you discovered about metrics
> sharing and how this interacts with fib_info releases.
>
> What I've choosen to do is two-fold:
>
> 1) Update ->_metrics atomically with cmpxchg once a route becomes publicly
> visible.
>
> 2) Remember and grab a reference to the fib_info for shared read-only
> metrics in rt->fi, then release it once the metrics regerence goes
> away.
>
> It sounds expensive but hear me out :-)
>
> First of all, at rt_set_nexthop() time, the atomic we use to grab a
> ref to the fib_info is replacing a 60-byte memcpy() into the dst
> metrics.
>
> Next, the ->_metrics atomic to un-COW the metrics at destroy time
> might in fact be overkill. Especially once writable metrics live in
> the inetpeer cache (that's the next set of patches after this one).
>
> Finally, once this change is stabilized we can be a lot smarter about
> what we do at the time an entry is created. For example, when a route
> is looked up for a TCP socket, we essentially know we are going to COW
> the route %99.99999 of the time. So we can pass a hint into TCP's
> route lookups in the flow flags field telling it to pre-COW the route.
>
> TCP pre-COW'ing of metrics will thus save several atomics.
>
> Anyways, here is the patch, it is only build tested at this point, but
> I wanted to get feedback from you about the basic gist of things
> as soon as possible.
>
> Thanks!
>
Thanks David, I read this (I am a bit busy preparing my travel to
Reunion/Maurice islands). This looks pretty nice. I have one comment :
>
> +u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old)
> +{
> + u32 *p = kmalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC);
> +
> + if (p) {
> + u32 *old_p = __DST_METRICS_PTR(old);
> + unsigned long prev, new;
> +
> + memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
> +
> + new = (unsigned long) p;
> + prev = cmpxchg(&dst->_metrics, old, new);
> +
> + if (prev != old) {
> + kfree(p);
> + p = __DST_METRICS_PTR(prev);
> + if (prev & DST_METRICS_READ_ONLY)
> + p = NULL;
> + }
> + }
> + return p;
> +}
> +EXPORT_SYMBOL(dst_cow_metrics_generic);
> +
...
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 3e5b7cc..7fc6301 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -152,6 +152,36 @@ static void ipv4_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
> {
> }
>
> +static u32 *ipv4_cow_metrics(struct dst_entry *dst, unsigned long old)
> +{
> + u32 *p = kmalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC);
> +
> + if (p) {
> + u32 *old_p = __DST_METRICS_PTR(old);
> + unsigned long prev, new;
> +
> + memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
> +
> + new = (unsigned long) p;
> + prev = cmpxchg(&dst->_metrics, old, new);
> +
> + if (prev != old) {
> + kfree(p);
> + p = __DST_METRICS_PTR(prev);
> + if (prev & DST_METRICS_READ_ONLY)
> + p = NULL;
> + } else {
Hmm, I first asked myself why you dont use dst_cow_metrics_generic() to
perform the generic allocation, but saw following :
> + struct rtable *rt = (struct rtable *) dst;
> +
Since you use cmpxchg() to permut the dst->_metrics, I feel this rt->fi
needs some protection as well. Maybe store fi pointer inside the metrics
instead of dst, or else you need a spinlock to perform the whole
transaction (change dst->_metrics & rt->fi) ?
> + if (rt->fi) {
> + fib_info_put(rt->fi);
> + rt->fi = NULL;
> + }
> + }
> + }
> + return p;
> +}
> +
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists