lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 26 Apr 2012 13:18:22 -0700
From:	Jay Vosburgh <fubar@...ibm.com>
To:	Jiri Bohac <jbohac@...e.cz>
cc:	Andy Gospodarek <andy@...yhouse.net>, netdev@...r.kernel.org
Subject: Re: [PATCH][RFC] bonding: delete migrated IP addresses from the rlb hash table

Jiri Bohac <jbohac@...e.cz> wrote:

>I finally got back to this, sorry for such a long delay ...
>Start of this thread is here:
>http://thread.gmane.org/gmane.linux.network/222233
>
>On Wed, Mar 07, 2012 at 05:02:16PM +0100, Jiri Bohac wrote:
>> On Wed, Feb 29, 2012 at 06:12:20PM -0800, Jay Vosburgh wrote:
>> > 	I've done some initial testing with this, and so far I'm seeing
>> > one problem: every time the local host (with bonding) sends a broadcast
>> > ARP, that ends up flushing the entire RLB table.  Well, all entries that
>> > match the IP on the bond that's sending an ARP request, which is just
>> > one address in my testing.
>> > 
>> > 	Anyway, this happens because the switch forwards the broadcast
>> > ARP back around to one of the other bond slaves, and then that
>> > "incoming" ARP bears an ip_src of our already-in-use IP address, and
>> > that matches everything in the table.
>> 
>> Good catch! I did not notice this.
>> 
>> > 	Perhaps a check that the ip_src being flushed is not actually in
>> > use locally is warranted?
>> 
>> This would not work for the setups where the bonding master is
>> bridget to some other network. I think it would be better to also
>> store the source (server) MAC address in struct client_info and
>> only flush the hash table entries if the MAC address from the
>> incoming APR packet and the source MAC address stored in the hash
>> table differ.

	Just to make sure I understand: the additional check you propose
(beyond a check that the IP source address is not locally in use) is for
the purpose of minimizing unnecessary flushes, by insuring that the
address really has moved.  Correct?

>> Updated patch follows (compile-tested only) I also
>> fixed the coding style problems you pointed out. As for the
>> forward/reverse naming, it's your call. Should I change it to
>> src/dst?
>
>I now thoroughly tested this updated patch in a real setup. It
>works as intended - it does not purge the entries when receiving the
>looped-back ARP requests. It correctly purges the offending
>entries when an ARP packet arrives with a its SRC IP address
>matching a SRC IP address in the table and having differrent MAC
>address -- exactly the case that would cause the ARP cache
>poisoning if not purged from the table.
>
>Jay, would you please consider applying this? Or do you want me
>to somehow rename the RLB table forward/reverse elements?

	I'm going to give this a spin this afternoon, but just skimming
through it, I'm still not that thrilled about the "forward" and
"reverse" terminology applying to "hash by dst" and "hash by src"; why
not just call 'em "dst_next" and "src_next", et al, and cut out the
middle man?

	-J

>Re-sending the fixed patch with the original description:
>
>------
>
>
>Bonding in balance-alb mode records information from ARP packets
>passing through the bond in a hash table (rx_hashtbl).
>
>At certain situations (e.g. link change of a slave),
>rlb_update_rx_clients() will send out ARP packets to update ARP
>caches of other hosts on the network to achieve RX load balancing.
>
>The problem is that once an IP address is recorded in the hash
>table, it stays there indefinitely. If this IP address is
>migrated to a different host in the network, bonding still sends
>out ARP packets that poison other systems' ARP caches with
>invalid information.
>
>This patch solves this by looking at all incoming ARP packets,
>and checking if the source IP address is one of the source
>addresses stored in the rx_hashtbl. If it is, the corresponding
>hash table entries are removed. Thus, when an IP address is
>migrated, the first ARP broadcast by its new owner will purge the
>offending entries of rx_hashtbl.
>
>The hash table is hashed by ip_dst. To be able to do the above
>check efficiently (not walking the whole hash table), we need a
>reverse mapping (by ip_src).
>
>I added three new members in struct rlb_client_info:
>   rx_hashtbl[x].reverse_first will point to the start of a list of
>      entries for which hash(ip_src) == x.
>   The list is linked with reverse_next and reverse_prev.
>
>When an incoming ARP packet arrives at rlb_arp_recv()
>rlb_purge_src_ip() can quickly walk only the entries on the
>corresponding lists, i.e. the entries that are likely to contain
>the offending IP address.
>
>Signed-off-by: Jiri Bohac <jbohac@...e.cz>
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index f820b26..a938ab6 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -84,6 +84,9 @@ static inline struct arp_pkt *arp_pkt(const struct sk_buff *skb)
>
> /* Forward declaration */
> static void alb_send_learning_packets(struct slave *slave, u8 mac_addr[]);
>+static void rlb_purge_src_ip(struct bonding *bond, struct arp_pkt *arp);
>+static void rlb_delete_table_entry_reverse(struct bonding *bond, u32 index);
>+static void rlb_set_reverse_entry(struct bonding *bond, u32 ip_src_hash, u32 ip_dst_hash);
>
> static inline u8 _simple_hash(const u8 *hash_start, int hash_size)
> {
>@@ -366,6 +369,17 @@ static void rlb_arp_recv(struct sk_buff *skb, struct bonding *bond,
> 		return;
> 	}
>
>+	/* We received an ARP from arp->ip_src.
>+	 * We might have used this IP address previously (on the bonding host
>+	 * itself or on a system that is bridged together with the bond).
>+	 * However, if arp->mac_src is different than what is stored in
>+	 * rx_hashtbl, some other host is now using the IP and we must prevent
>+	 * sending out client updates with this IP address as the source.
>+	 * Clean up all hash table entries that have this address as ip_src but
>+	 * have a dirrerent mac_src.
>+	 */
>+	rlb_purge_src_ip(bond, arp);
>+
> 	if (arp->op_code == htons(ARPOP_REPLY)) {
> 		/* update rx hash table for this ARP */
> 		rlb_update_entry_from_arp(bond, arp);
>@@ -635,6 +649,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
> 				/* update mac address from arp */
> 				memcpy(client_info->mac_dst, arp->mac_dst, ETH_ALEN);
> 			}
>+			memcpy(client_info->mac_src, arp->mac_src, ETH_ALEN);
>
> 			assigned_slave = client_info->slave;
> 			if (assigned_slave) {
>@@ -657,6 +672,13 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
> 	assigned_slave = rlb_next_rx_slave(bond);
>
> 	if (assigned_slave) {
>+		if (!(client_info->assigned && client_info->ip_src == arp->ip_src)) {
>+			/* ip_src is going to be updated, fix the reverse hashing */
>+			u32 hash_src = _simple_hash((u8 *)&arp->ip_src, sizeof(arp->ip_src));
>+			rlb_delete_table_entry_reverse(bond, hash_index);
>+			rlb_set_reverse_entry(bond, hash_src, hash_index);
>+		}
>+
> 		client_info->ip_src = arp->ip_src;
> 		client_info->ip_dst = arp->ip_dst;
> 		/* arp->mac_dst is broadcast for arp reqeusts.
>@@ -664,6 +686,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
> 		 * upon receiving an arp reply.
> 		 */
> 		memcpy(client_info->mac_dst, arp->mac_dst, ETH_ALEN);
>+		memcpy(client_info->mac_src, arp->mac_src, ETH_ALEN);
> 		client_info->slave = assigned_slave;
>
> 		if (compare_ether_addr_64bits(client_info->mac_dst, mac_bcast)) {
>@@ -769,11 +792,109 @@ static void rlb_rebalance(struct bonding *bond)
> }
>
> /* Caller must hold rx_hashtbl lock */
>-static void rlb_init_table_entry(struct rlb_client_info *entry)
>+static void rlb_init_table_entry_forward(struct rlb_client_info *entry)
> {
>-	memset(entry, 0, sizeof(struct rlb_client_info));
> 	entry->next = RLB_NULL_INDEX;
> 	entry->prev = RLB_NULL_INDEX;
>+	entry->assigned = 0;
>+	entry->slave = NULL;
>+	entry->tag = 0;
>+}
>+static void rlb_init_table_entry_reverse(struct rlb_client_info *entry)
>+{
>+	entry->reverse_first = RLB_NULL_INDEX;
>+	entry->reverse_prev = RLB_NULL_INDEX;
>+	entry->reverse_next = RLB_NULL_INDEX;
>+}
>+
>+static void rlb_init_table_entry(struct rlb_client_info *entry)
>+{
>+	memset(entry, 0, sizeof(struct rlb_client_info));
>+	rlb_init_table_entry_forward(entry);
>+	rlb_init_table_entry_reverse(entry);
>+}
>+
>+static void rlb_delete_table_entry_forward(struct bonding *bond, u32 index)
>+{
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>+	u32 next_index = bond_info->rx_hashtbl[index].next;
>+	u32 prev_index = bond_info->rx_hashtbl[index].prev;
>+
>+	if (index == bond_info->rx_hashtbl_head)
>+		bond_info->rx_hashtbl_head = next_index;
>+	if (prev_index != RLB_NULL_INDEX)
>+		bond_info->rx_hashtbl[prev_index].next = next_index;
>+	if (next_index != RLB_NULL_INDEX)
>+		bond_info->rx_hashtbl[next_index].prev = prev_index;
>+}
>+
>+static void rlb_delete_table_entry_reverse(struct bonding *bond, u32 index)
>+{
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>+	u32 next_index = bond_info->rx_hashtbl[index].reverse_next;
>+	u32 prev_index = bond_info->rx_hashtbl[index].reverse_prev;
>+
>+	bond_info->rx_hashtbl[index].reverse_next = RLB_NULL_INDEX;
>+	bond_info->rx_hashtbl[index].reverse_prev = RLB_NULL_INDEX;
>+
>+	if (next_index != RLB_NULL_INDEX)
>+		bond_info->rx_hashtbl[next_index].reverse_prev = prev_index;
>+
>+	if (prev_index == RLB_NULL_INDEX)
>+		return;
>+
>+	/* is prev_index pointing to the head of this chain? */
>+	if (bond_info->rx_hashtbl[prev_index].reverse_first == index)
>+		bond_info->rx_hashtbl[prev_index].reverse_first = next_index;
>+	else
>+		bond_info->rx_hashtbl[prev_index].reverse_next = next_index;
>+
>+}
>+
>+static void rlb_delete_table_entry(struct bonding *bond, u32 index)
>+{
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>+	struct rlb_client_info *entry = &(bond_info->rx_hashtbl[index]);
>+
>+	rlb_delete_table_entry_forward(bond, index);
>+	rlb_init_table_entry_forward(entry);
>+
>+	rlb_delete_table_entry_reverse(bond, index);
>+}
>+
>+static void rlb_set_reverse_entry(struct bonding *bond, u32 ip_src_hash, u32 ip_dst_hash)
>+{
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>+	u32 next;
>+
>+	bond_info->rx_hashtbl[ip_dst_hash].reverse_prev = ip_src_hash;
>+	next = bond_info->rx_hashtbl[ip_src_hash].reverse_first;
>+	bond_info->rx_hashtbl[ip_dst_hash].reverse_next = next;
>+	if (next != RLB_NULL_INDEX)
>+		bond_info->rx_hashtbl[next].reverse_prev = ip_dst_hash;
>+	bond_info->rx_hashtbl[ip_src_hash].reverse_first = ip_dst_hash;
>+}
>+
>+/* deletes all rx_hashtbl entries with  arp->ip_src if their mac_src does
>+ * not match arp->mac_src */
>+static void rlb_purge_src_ip(struct bonding *bond, struct arp_pkt *arp)
>+{
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>+	u32 ip_src_hash = _simple_hash((u8*)&(arp->ip_src), sizeof(arp->ip_src));
>+	u32 index;
>+
>+	_lock_rx_hashtbl_bh(bond);
>+
>+	index = bond_info->rx_hashtbl[ip_src_hash].reverse_first;
>+	while (index != RLB_NULL_INDEX) {
>+		struct rlb_client_info *entry = &(bond_info->rx_hashtbl[index]);
>+		u32 next_index = entry->reverse_next;
>+		if (entry->ip_src == arp->ip_src &&
>+		    compare_ether_addr_64bits(arp->mac_src, entry->mac_src))
>+				rlb_delete_table_entry(bond, index);
>+		index = next_index;
>+	}
>+	_unlock_rx_hashtbl_bh(bond);
> }
>
> static int rlb_initialize(struct bonding *bond)
>@@ -831,21 +952,9 @@ static void rlb_clear_vlan(struct bonding *bond, unsigned short vlan_id)
> 	while (curr_index != RLB_NULL_INDEX) {
> 		struct rlb_client_info *curr = &(bond_info->rx_hashtbl[curr_index]);
> 		u32 next_index = bond_info->rx_hashtbl[curr_index].next;
>-		u32 prev_index = bond_info->rx_hashtbl[curr_index].prev;
>-
>-		if (curr->tag && (curr->vlan_id == vlan_id)) {
>-			if (curr_index == bond_info->rx_hashtbl_head) {
>-				bond_info->rx_hashtbl_head = next_index;
>-			}
>-			if (prev_index != RLB_NULL_INDEX) {
>-				bond_info->rx_hashtbl[prev_index].next = next_index;
>-			}
>-			if (next_index != RLB_NULL_INDEX) {
>-				bond_info->rx_hashtbl[next_index].prev = prev_index;
>-			}
>
>-			rlb_init_table_entry(curr);
>-		}
>+		if (curr->tag && (curr->vlan_id == vlan_id))
>+			rlb_delete_table_entry(bond, curr_index);
>
> 		curr_index = next_index;
> 	}
>diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h
>index 90f140a..8286df52 100644
>--- a/drivers/net/bonding/bond_alb.h
>+++ b/drivers/net/bonding/bond_alb.h
>@@ -100,6 +100,7 @@ struct tlb_client_info {
> struct rlb_client_info {
> 	__be32 ip_src;		/* the server IP address */
> 	__be32 ip_dst;		/* the client IP address */
>+	u8  mac_src[ETH_ALEN];	/* the server MAC address */
> 	u8  mac_dst[ETH_ALEN];	/* the client MAC address */
> 	u32 next;		/* The next Hash table entry index */
> 	u32 prev;		/* The previous Hash table entry index */
>@@ -108,6 +109,9 @@ struct rlb_client_info {
> 	struct slave *slave;	/* the slave assigned to this client */
> 	u8 tag;			/* flag - need to tag skb */
> 	unsigned short vlan_id;	/* VLAN tag associated with IP address */
>+	u32 reverse_next;	/* next entry with same hash(ip_src) */
>+	u32 reverse_prev;	/* prev entry with same hash(ip_src) */
>+	u32 reverse_first;	/* first entry with hash(ip_src) == this entry's index */
> };
>
> struct tlb_slave_info {
>
>
>
>-- 
>Jiri Bohac <jbohac@...e.cz>
>SUSE Labs, SUSE CZ
>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ