netdev - [PATCH] ipv4: handle GARPs specially when updating neighbors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 21 Apr 2010 01:02:02 -0700
From:	Sasha Levin <sasha@...sleep.com>
To:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: [PATCH] ipv4: handle GARPs specially when updating neighbors

From: Sasha Levin <sasha@...sleep.com>

We are currently testing IP fail-over on storage devices, and have observed an issue with the IP transfer from one device to another.

Assuming we have 2 storage devices A and B, and a server C which uses the storage, the scenario is:

1. Device A sends an ARP request which server C sees – server C updates it’s ARP table with the MAC of device A.
2. Device A fails, Device B takes over the IP and sends out a GARP.
3. Even though device C sees the GARP, it ignores it and keeps trying to communicate with device A until the entry is removed from its cache and a new ARP request is generated.

The code which causes this is located in arp_process@...t/ipv4/arp.c:

override = time_after(jiffies, n->updated + n->parms->locktime);

/* Broadcast replies and request packets
   do not assert neighbour reachability.
 */
if (arp->ar_op != htons(ARPOP_REPLY) ||
    skb->pkt_type != PACKET_HOST)
        state = NUD_STALE;
neigh_update(n, sha, state, override ? NEIGH_UPDATE_F_OVERRIDE : 0);
neigh_release(n);

According to the code, this scenario happens because the kernel ignores any ARP updates which happened in a short period after the previous ARP update. The reason which was stated in the comments is  “If several different ARP replies follows back-to-back, use the FIRST one. It is possible, if several proxy agents are active. Taking the first reply prevents arp trashing and chooses the fastest router.”.

This, however, doesn’t take into account GARPs which are not being sent by ARP proxies anyway and just ignores them too – causing a loss of communication for over a minute until the ARP cache refreshes.

Signed-off-by: Sasha Levin <sasha@...sleep.com>
---
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 1a9dd66..caa2093 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -876,8 +876,11 @@ static int arp_process(struct sk_buff *skb)
 		   use the FIRST one. It is possible, if several proxy
 		   agents are active. Taking the first reply prevents
 		   arp trashing and chooses the fastest router.
+
+		   GARPs are always updating the cache since they can
+		   originate from different devices with the same IP.
 		 */
-		override = time_after(jiffies, n->updated + n->parms->locktime);
+		override = (sip == tip) || time_after(jiffies, n->updated + n->parms->locktime);

 		/* Broadcast replies and request packets
 		   do not assert neighbour reachability.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html