[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <EF944FCCC928C64A875FE3EA1A41062C38FD52F406@P3PW5EX1MB02.EX1.SECURESERVER.NET>
Date: Wed, 21 Apr 2010 01:02:02 -0700
From: Sasha Levin <sasha@...sleep.com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: [PATCH] ipv4: handle GARPs specially when updating neighbors
From: Sasha Levin <sasha@...sleep.com>
We are currently testing IP fail-over on storage devices, and have observed an issue with the IP transfer from one device to another.
Assuming we have 2 storage devices A and B, and a server C which uses the storage, the scenario is:
1. Device A sends an ARP request which server C sees – server C updates it’s ARP table with the MAC of device A.
2. Device A fails, Device B takes over the IP and sends out a GARP.
3. Even though device C sees the GARP, it ignores it and keeps trying to communicate with device A until the entry is removed from its cache and a new ARP request is generated.
The code which causes this is located in arp_process@...t/ipv4/arp.c:
override = time_after(jiffies, n->updated + n->parms->locktime);
/* Broadcast replies and request packets
do not assert neighbour reachability.
*/
if (arp->ar_op != htons(ARPOP_REPLY) ||
skb->pkt_type != PACKET_HOST)
state = NUD_STALE;
neigh_update(n, sha, state, override ? NEIGH_UPDATE_F_OVERRIDE : 0);
neigh_release(n);
According to the code, this scenario happens because the kernel ignores any ARP updates which happened in a short period after the previous ARP update. The reason which was stated in the comments is “If several different ARP replies follows back-to-back, use the FIRST one. It is possible, if several proxy agents are active. Taking the first reply prevents arp trashing and chooses the fastest router.”.
This, however, doesn’t take into account GARPs which are not being sent by ARP proxies anyway and just ignores them too – causing a loss of communication for over a minute until the ARP cache refreshes.
Signed-off-by: Sasha Levin <sasha@...sleep.com>
---
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 1a9dd66..caa2093 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -876,8 +876,11 @@ static int arp_process(struct sk_buff *skb)
use the FIRST one. It is possible, if several proxy
agents are active. Taking the first reply prevents
arp trashing and chooses the fastest router.
+
+ GARPs are always updating the cache since they can
+ originate from different devices with the same IP.
*/
- override = time_after(jiffies, n->updated + n->parms->locktime);
+ override = (sip == tip) || time_after(jiffies, n->updated + n->parms->locktime);
/* Broadcast replies and request packets
do not assert neighbour reachability.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists