[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090519162048.GB28034@hmsreliant.think-freely.org>
Date: Tue, 19 May 2009 12:20:48 -0400
From: Neil Horman <nhorman@...driver.com>
To: Eric Dumazet <dada1@...mosbay.com>
Cc: Jarek Poplawski <jarkao2@...il.com>, lav@....ru,
Stephen Hemminger <shemminger@...ux-foundation.org>,
netdev@...r.kernel.org
Subject: Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c
On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > On 19-05-2009 04:35, Stephen Hemminger wrote:
> >> Begin forwarded message:
> >>
> >> Date: Mon, 18 May 2009 14:10:20 GMT
> >> From: bugzilla-daemon@...zilla.kernel.org
> >> To: shemminger@...ux-foundation.org
> >> Subject: [Bug 13339] New: rtable leak in ipv4/route.c
> >>
> >>
> >> http://bugzilla.kernel.org/show_bug.cgi?id=13339
> > ...
> >> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> >> patch has at least one critical flaw, and another problem.
> >>
> >> rt_intern_hash calculates rthi pointer, which is later used for new entry
> >> insertion. The same loop calculates cand pointer which is used to clean the
> >> list. If the pointers are the same, rtable leak occurs, as first the cand is
> >> removed then the new entry is appended to it.
> >>
> >> This leak leads to unregister_netdevice problem (usage count > 0).
> >>
> >> Another problem of the patch is that it tries to insert the entries in certain
> >> order, to facilitate counting of entries distinct by all but QoS parameters.
> >> Unfortunately, referencing an existing rtable entry moves it to list beginning,
> >> to speed up further lookups, so the carefully built order is destroyed.
>
> We could change rt_check_expire() to be smarter and handle any order in chains.
>
> This would let rt_intern_hash() be simpler.
>
> As its a more performance critical path, all would be good :)
>
> >>
> >> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> >> it will also destroy the ordering.
> >
> > I think fixing this bug fast is more important than this
> > ordering or counting. Could you send your patch proposal?
> >
>
I was thinking something along these lines (also completely untested). The
extra check in rt_intern hash prevents us from identifying a 'group leader' that
is about to be deleted, and sets rthi to point at the group leader rather than
the last entry in the group as it previously did, which we then mark with the
new flag in the dst_entry (updating it on rt_free of course). This lets us
identify the boundaries of a group, so when we do a move to front heruistic, we
can move the entire group rather than just the single entry. I prefer this to
Erics approach since, even though rt_check_expire isn't in a performance
critical path, the addition of the extra list search when the list is unordered
makes rt_check_expire run in O(n^2) rather than O(n), which I think will have a
significant impact on high traffic systems. Since the ordered list creation was
done at almost zero performance cost in rt_intern_hash, I'd like to keep it if
at all possible.
There is of course a shortcomming in my patch in that it doesn't actually do
anything currently with DST_GRPLDR other than set/clear it appropriately, I've
yet to identify where the route cache move to front heruistic is implemented.
If someone could show me where that is, I'd be happy to finish this patch.
Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists