[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1296123612.3027.15.camel@edumazet-laptop>
Date: Thu, 27 Jan 2011 11:20:12 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Miller <davem@...emloft.net>
Cc: netdev@...r.kernel.org
Subject: Re: [RFC PATCH] net: Implement read-only protection and COW'ing of
metrics.
Le jeudi 27 janvier 2011 à 11:01 +0100, Eric Dumazet a écrit :
> Le mercredi 26 janvier 2011 à 15:25 -0800, David Miller a écrit :
> > Eric, thanks again for your feedback. I've taken a stab at fixing the
> > various races, in particular the one you discovered about metrics
> > sharing and how this interacts with fib_info releases.
> >
> > What I've choosen to do is two-fold:
> >
> > 1) Update ->_metrics atomically with cmpxchg once a route becomes publicly
> > visible.
> >
> > 2) Remember and grab a reference to the fib_info for shared read-only
> > metrics in rt->fi, then release it once the metrics regerence goes
> > away.
> >
> > It sounds expensive but hear me out :-)
> >
> > First of all, at rt_set_nexthop() time, the atomic we use to grab a
> > ref to the fib_info is replacing a 60-byte memcpy() into the dst
> > metrics.
> >
> > Next, the ->_metrics atomic to un-COW the metrics at destroy time
> > might in fact be overkill. Especially once writable metrics live in
> > the inetpeer cache (that's the next set of patches after this one).
> >
> > Finally, once this change is stabilized we can be a lot smarter about
> > what we do at the time an entry is created. For example, when a route
> > is looked up for a TCP socket, we essentially know we are going to COW
> > the route %99.99999 of the time. So we can pass a hint into TCP's
> > route lookups in the flow flags field telling it to pre-COW the route.
> >
> > TCP pre-COW'ing of metrics will thus save several atomics.
> >
> > Anyways, here is the patch, it is only build tested at this point, but
> > I wanted to get feedback from you about the basic gist of things
> > as soon as possible.
> >
> > Thanks!
> >
>
> Thanks David, I read this (I am a bit busy preparing my travel to
> Reunion/Maurice islands). This looks pretty nice. I have one comment :
>
> >
> > +u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old)
> > +{
> > + u32 *p = kmalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC);
> > +
> > + if (p) {
> > + u32 *old_p = __DST_METRICS_PTR(old);
> > + unsigned long prev, new;
> > +
> > + memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
> > +
> > + new = (unsigned long) p;
> > + prev = cmpxchg(&dst->_metrics, old, new);
> > +
> > + if (prev != old) {
> > + kfree(p);
> > + p = __DST_METRICS_PTR(prev);
> > + if (prev & DST_METRICS_READ_ONLY)
> > + p = NULL;
> > + }
> > + }
> > + return p;
> > +}
> > +EXPORT_SYMBOL(dst_cow_metrics_generic);
> > +
> ...
>
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 3e5b7cc..7fc6301 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -152,6 +152,36 @@ static void ipv4_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
> > {
> > }
> >
> > +static u32 *ipv4_cow_metrics(struct dst_entry *dst, unsigned long old)
> > +{
> > + u32 *p = kmalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC);
> > +
> > + if (p) {
> > + u32 *old_p = __DST_METRICS_PTR(old);
> > + unsigned long prev, new;
> > +
> > + memcpy(p, old_p, sizeof(u32) * RTAX_MAX);
> > +
> > + new = (unsigned long) p;
> > + prev = cmpxchg(&dst->_metrics, old, new);
> > +
> > + if (prev != old) {
> > + kfree(p);
> > + p = __DST_METRICS_PTR(prev);
> > + if (prev & DST_METRICS_READ_ONLY)
> > + p = NULL;
> > + } else {
>
> Hmm, I first asked myself why you dont use dst_cow_metrics_generic() to
> perform the generic allocation, but saw following :
>
> > + struct rtable *rt = (struct rtable *) dst;
> > +
>
> Since you use cmpxchg() to permut the dst->_metrics, I feel this rt->fi
> needs some protection as well. Maybe store fi pointer inside the metrics
> instead of dst, or else you need a spinlock to perform the whole
> transaction (change dst->_metrics & rt->fi) ?
>
> > + if (rt->fi) {
> > + fib_info_put(rt->fi);
> > + rt->fi = NULL;
> > + }
> > + }
> > + }
> > + return p;
> > +}
> > +
>
Hmm, reading again, I realize the rt->fi was set only when installing
the readonly metrics, so ignore my previous mail.
Thanks !
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists