[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090519174504.GD28034@hmsreliant.think-freely.org>
Date: Tue, 19 May 2009 13:45:04 -0400
From: Neil Horman <nhorman@...driver.com>
To: Jarek Poplawski <jarkao2@...il.com>
Cc: Eric Dumazet <dada1@...mosbay.com>, lav@....ru,
Stephen Hemminger <shemminger@...ux-foundation.org>,
netdev@...r.kernel.org
Subject: Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
On Tue, May 19, 2009 at 07:17:03PM +0200, Jarek Poplawski wrote:
> On Tue, May 19, 2009 at 12:23:30PM -0400, Neil Horman wrote:
> > On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote:
> > > Jarek Poplawski a écrit :
> > > > On 19-05-2009 04:35, Stephen Hemminger wrote:
> > > >> Begin forwarded message:
> > > >>
> > > >> Date: Mon, 18 May 2009 14:10:20 GMT
> > > >> From: bugzilla-daemon@...zilla.kernel.org
> > > >> To: shemminger@...ux-foundation.org
> > > >> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
> > > >>
> > > >>
> > > >> http://bugzilla.kernel.org/show_bug.cgi?id=13339
> > > > ...
> > > >> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> > > >> patch has at least one critical flaw, and another problem.
> > > >>
> > > >> rt_intern_hash calculates rthi pointer, which is later used for new entry
> > > >> insertion. The same loop calculates cand pointer which is used to clean the
> > > >> list. If the pointers are the same, rtable leak occurs, as first the cand is
> > > >> removed then the new entry is appended to it.
> > > >>
> > > >> This leak leads to unregister_netdevice problem (usage count > 0).
> > > >>
> > > >> Another problem of the patch is that it tries to insert the entries in certain
> > > >> order, to facilitate counting of entries distinct by all but QoS parameters.
> > > >> Unfortunately, referencing an existing rtable entry moves it to list beginning,
> > > >> to speed up further lookups, so the carefully built order is destroyed.
> > >
> > > We could change rt_check_expire() to be smarter and handle any order in chains.
> > >
> > > This would let rt_intern_hash() be simpler.
> > >
> > > As its a more performance critical path, all would be good :)
> > >
> > > >>
> > > >> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> > > >> it will also destroy the ordering.
> > > >
> > > > I think fixing this bug fast is more important than this
> > > > ordering or counting. Could you send your patch proposal?
> > > >
> > >
> >
> > Of course, it helps if I attach the patch :)
> >
> >
> > diff --git a/include/net/dst.h b/include/net/dst.h
> > index 6be3b08..a39db6d 100644
> > --- a/include/net/dst.h
> > +++ b/include/net/dst.h
> > @@ -47,6 +47,7 @@ struct dst_entry
> > #define DST_NOXFRM 2
> > #define DST_NOPOLICY 4
> > #define DST_NOHASH 8
> > +#define DST_GRPLDR 16
> > unsigned long expires;
> >
> > unsigned short header_len; /* more space at head required */
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index c4c60e9..0120f0e 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -610,6 +610,8 @@ static inline int ip_rt_proc_init(void)
> >
> > static inline void rt_free(struct rtable *rt)
> > {
> > + if (rt->u.dst.flags & DST_GRPLDR)
> > + rt->u.dst.rt_next->u.dst.flag |= DST_GRPLDR;
> > call_rcu_bh(&rt->u.dst.rcu_head, dst_rcu_free);
> > }
> >
> > @@ -1143,8 +1145,11 @@ restart:
> > * relvant to the hash function together, which we use to adjust
> > * our chain length
> > */
> > - if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl))
> > + if (!*rthi && *rthp &&
> > + compare_hash_inputs(&(*rthp)->fl, &rt->fl) &&
> > + (cand != rth))
> > rthi = rth;
>
> Does it really prevent cand == rthi in the next loop?
>
Yes, because cand and rthi are inspected during the same loop iteration, and
both assigned from rth. since I added a check which requires !rthi (which is
actually a bug above that I need to fix), once rthi is set, it won't be moved,
and on the next iteration, if cand is assigned, it is assigned to rth, which
(being the next iteration), is a different rt cache entry
> I didn't check Eric's patch yet, but I really don't know what's wrong
> with something as simple as below for -stable, until "proper" fix is
> analyzed and tested.
>
Because the above fixes it without continuing to break the ordering. You're
change below prevents the leak, but still allows for disordered lists to form,
which IMHO doesn't really make it a candidate for -stable. I'd much rather fix
both the leak and the ordering before pushing anything out
speaking of which, I'm going to ask again, I've been looking all morning, and
I'm unable to find the move to front heuristic that you mentioned creates furthe
list disordering. If you can point it out to me, I can complete my patch and
present something for you to test more throughly.
Neil
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists