[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1314031455.2307.113.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>
Date: Mon, 22 Aug 2011 18:44:15 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Guy Yur <guyyur@...il.com>
Cc: netdev@...r.kernel.org
Subject: Re: [RFC][PATCH] Failed neighbors attached to routes are not
released
Le vendredi 19 août 2011 à 21:53 +0300, Guy Yur a écrit :
> Hi,
>
> The issue I am seeing is with a neighbor used as a gateway in a route,
> when the neighbor gets nud FAILED it will not be removed from the
> neighbor cache.
> The reference count for the neighbor remains > 1
> when neigh_periodic_work() is called.
>
> Issue noticed with IPv6 neighbors on Arch Linux with kernel 3.0.3
> kernel config includes CONFIG_IPV6_ROUTER_PREF
>
> The problem affects routing when the neighbor loses connectivity and returns.
>
> Scenario: Using a default route and a static route through different interfaces.
> When the neighbor gateway of the static route goes down the traffic
> will move to the default gateway as expected.
> Once the static route neighbor comes back up it is not asked for
> neighbor solicitation
> because the route is marked as FAILED and the traffic will continue to
> pass through the default gateway.
>
> Steps to reproduce the route not being removed:
> 1. add an IPv6 address on an interface
> 2. add a route to a network through a gateway on the interface's network
> 3. make sure the gateway address is not reachable
> 4. ping6 a host in the route network
> 5. "ip -6 nei" will show the gateway neighbor as FAILED and it won't be released
>
> Steps to reproduce the routing problem:
> 1. client and two gateway machines (A and B)
> 2. on the client define a static route through A and a default route through B
> 3. disconnect eth on A
> 4. ping6 a host in the network that should go through A
> after a while the traffic will move through B which is the default route
> 5. reconnect eth on A
> 6. ping6 a host in the network again, the traffic will still go through B
> "ip -6 nei" on the client will still show A as FAILED
>
>
> Patch to change the nud state to NONE if the reference count > 1
> allowing the neighbor to be released in a future pass.
I wonder why a 'future pass' is needed at all.
Shouldnt we immediately detect link becomes alive and force an immediate
flush at this point, before waiting a garbage collect timer ?
>
> --- linux/net/core/neighbour.c.orig 2011-08-19 13:15:35.041524169 +0300
> +++ linux/net/core/neighbour.c 2011-08-19 18:19:07.271276096 +0300
> @@ -802,14 +802,16 @@ static void neigh_periodic_work(struct w
> if (time_before(n->used, n->confirmed))
> n->used = n->confirmed;
>
> - if (atomic_read(&n->refcnt) == 1 &&
> - (state == NUD_FAILED ||
> + if ((state == NUD_FAILED ||
> time_after(jiffies, n->used + n->parms->gc_staletime))) {
> - *np = n->next;
> - n->dead = 1;
> - write_unlock(&n->lock);
> - neigh_cleanup_and_release(n);
> - continue;
> + if (atomic_read(&n->refcnt) == 1) {
> + *np = n->next;
> + n->dead = 1;
> + write_unlock(&n->lock);
> + neigh_cleanup_and_release(n);
> + continue;
> + } else
> + n->nud_state = NUD_NONE;
> }
> write_unlock(&n->lock);
> --
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists