[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090622152314.GB14359@hmsreliant.think-freely.org>
Date: Mon, 22 Jun 2009 11:23:14 -0400
From: Neil Horman <nhorman@...driver.com>
To: netdev@...r.kernel.org
Cc: davem@...emloft.net, kuznet@....inr.ac.ru, pekkas@...core.fi,
jmorris@...ei.org, yoshfuji@...ux-ipv6.org, kaber@...sh.net,
jarkao2@...il.com, mbizon@...ebox.fr, nhorman@...driver.com
Subject: [PATCH] ipv4 routing: Fixes to allow route cache entries to work
when route caching is disabled
Hey all-
As we've been discussing recently, There are a few bugs with routing if
we exceed our route cache rebuild count, and subsequently disable route caching.
An oops was reported to me, which has been subsequently fixed, and then
subsequently a route cache leak and failure to forward frames was reported to me
when rt_caching returns false. I've reproduced these on a local system, and
tracked down the cause. This patch fixes both of these problems for me on my
test system.
Ensure that route cache entries are usable and reclaimable when caching is off
When route caching is disabled (rt_caching returns false), We still use route
cache entries that are created and passed into rt_intern_hash once. These
routes need to be made usable for the one call path that holds a reference to
them, and they need to be reclaimed when they're finished with their use. To be
made usable, they need to be associated with a neighbor table entry (which they
currently are not), otherwise iproute_finish2 just discards the packet, since we
don't know which L2 peer to send the packet to. To do this binding, we need to
follow the path a bit higher up in rt_intern_hash, which calls
arp_bind_neighbour, but not assign the route entry to the hash table.
Currently, if caching is off, we simply assign the route to the rp pointer and
are reutrn success. This patch associates us with a neighbor entry first.
Secondly, we need to make sure that any single use routes like this are known to
the garbage collector when caching is off. If caching is off, and we try to
hash in a route, it will leak when its refcount reaches zero. To avoid this,
this patch calls rt_free on the route cache entry passed into rt_intern_hash.
This places us on the gc list for the route cache garbage collector, so that
when its refcount reaches zero, it will be reclaimed (Thanks to Alexey for this
suggestion).
I've tested this on a local system here, and with these patches in place, I'm
able to maintain routed connectivity to remote systems, even if I set
/proc/sys/net/ipv4/rt_cache_rebuild_count to -1, which forces rt_caching to
return false.
Best
Neil
Signed-off-by: Neil Horman <nhorman@...hat.com>
Reported-by: Jarek Poplawski <jarkao2@...il.com>
Reported-by: Maxime Bizon <mbizon@...ebox.fr>
route.c | 44 ++++++++++++++++++++++++++------------------
1 file changed, 26 insertions(+), 18 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 65b3a8b..4b21513 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1076,6 +1076,7 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt,
u32 min_score;
int chain_length;
int attempts = !in_softirq();
+ int caching = rt_caching(dev_net(rt->u.dst.dev));
restart:
chain_length = 0;
@@ -1084,7 +1085,7 @@ restart:
candp = NULL;
now = jiffies;
- if (!rt_caching(dev_net(rt->u.dst.dev))) {
+ if (!caching) {
/*
* If we're not caching, just tell the caller we
* were successful and don't touch the route. The
@@ -1093,8 +1094,12 @@ restart:
* If we drop it here, the callers have no way to resolve routes
* when we're not caching. Instead, just point *rp at rt, so
* the caller gets a single use out of the route
+ * Note that we do rt_free on this new route entry, so that
+ * once its refcount hits zero, we are still able to reap it
+ * (Thanks Alexey)
*/
- goto report_and_exit;
+ rt_free(rt);
+ goto skip_hashing;
}
rthp = &rt_hash_table[hash].chain;
@@ -1174,6 +1179,7 @@ restart:
/* Try to bind route to arp only if it is output
route or unicast forwarding path.
*/
+skip_hashing:
if (rt->rt_type == RTN_UNICAST || rt->fl.iif == 0) {
int err = arp_bind_neighbour(&rt->u.dst);
if (err) {
@@ -1206,27 +1212,29 @@ restart:
}
}
- rt->u.dst.rt_next = rt_hash_table[hash].chain;
+ if (caching) {
+ rt->u.dst.rt_next = rt_hash_table[hash].chain;
#if RT_CACHE_DEBUG >= 2
- if (rt->u.dst.rt_next) {
- struct rtable *trt;
- printk(KERN_DEBUG "rt_cache @%02x: %pI4", hash, &rt->rt_dst);
- for (trt = rt->u.dst.rt_next; trt; trt = trt->u.dst.rt_next)
- printk(" . %pI4", &trt->rt_dst);
- printk("\n");
- }
+ if (rt->u.dst.rt_next) {
+ struct rtable *trt;
+ printk(KERN_DEBUG "rt_cache @%02x: %pI4",
+ hash, &rt->rt_dst);
+ for (trt = rt->u.dst.rt_next; trt; trt = trt->u.dst.rt_next)
+ printk(" . %pI4", &trt->rt_dst);
+ printk("\n");
+ }
#endif
- /*
- * Since lookup is lockfree, we must make sure
- * previous writes to rt are comitted to memory
- * before making rt visible to other CPUS.
- */
- rcu_assign_pointer(rt_hash_table[hash].chain, rt);
+ /*
+ * Since lookup is lockfree, we must make sure
+ * previous writes to rt are comitted to memory
+ * before making rt visible to other CPUS.
+ */
+ rcu_assign_pointer(rt_hash_table[hash].chain, rt);
- spin_unlock_bh(rt_hash_lock_addr(hash));
+ spin_unlock_bh(rt_hash_lock_addr(hash));
+ }
-report_and_exit:
if (rp)
*rp = rt;
else
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists