netdev - Re: [net-2.6 PATCH] bonding: fix broken multicast with round-robin mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <24080.1269556271@death.nxdomain.ibm.com>
Date:	Thu, 25 Mar 2010 15:31:11 -0700
From:	Jay Vosburgh <fubar@...ibm.com>
To:	Andy Gospodarek <andy@...yhouse.net>
cc:	netdev@...r.kernel.org, lhh@...hat.com,
	bonding-devel@...ts.sourceforge.net
Subject: Re: [net-2.6 PATCH] bonding: fix broken multicast with round-robin mode

Andy Gospodarek <andy@...yhouse.net> wrote:

>Round-robin (mode 0) does nothing to ensure that any multicast traffic
>originally destined for the host will continue to arrive at the host when
>the link that sent the IGMP join or membership report goes down.  One of
>the benefits of absolute round-robin transmit.
>
>Keeping track of subscribed multicast groups for each slave did not seem
>like a good use of resources, so I decided to simply send on the
>curr_active slave of the bond (typically the first enslaved device that
>is up).  This makes failover management simple as IGMP membership
>reports only need to be sent when the curr_active_slave changes.  I
>tested this patch and it appears to work as expected.
>
>Originally reported by Lon Hohberger <lhh@...hat.com>.
>
>Signed-off-by: Andy Gospodarek <andy@...yhouse.net>

	Seems reasonable, modulo a couple of minor things (see below).

	I checked, and the link failover logic appears to maintain
curr_active_slave even for round robin mode, which, prior to this patch,
didn't use it.

>CC: Lon Hohberger <lhh@...hat.com>
>CC: Jay Vosburgh <fubar@...ibm.com>
>
>---
> drivers/net/bonding/bond_main.c |   34 ++++++++++++++++++++++++++--------
> 1 files changed, 26 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 430c022..0b38455 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1235,6 +1235,11 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
> 			write_lock_bh(&bond->curr_slave_lock);
> 		}
> 	}
>+
>+	/* resend IGMP joins since all were sent on curr_active_slave */
>+	if (bond->params.mode == BOND_MODE_ROUNDROBIN) {
>+		bond_resend_igmp_join_requests(bond);
>+	}
> }
>
> /**
>@@ -4138,22 +4143,35 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
> 	struct bonding *bond = netdev_priv(bond_dev);
> 	struct slave *slave, *start_at;
> 	int i, slave_no, res = 1;
>+	struct iphdr *iph = ip_hdr(skb);
>
> 	read_lock(&bond->lock);
>
> 	if (!BOND_IS_OK(bond))
> 		goto out;
>-
> 	/*
>-	 * Concurrent TX may collide on rr_tx_counter; we accept that
>-	 * as being rare enough not to justify using an atomic op here
>+	 * Start with the curr_active_slave that joined the bond as the
>+	 * default for sending IGMP traffic.  For failover purposes one
>+	 * needs to maintain some consistency for the interface that will
>+	 * send the join/membership reports.  The curr_active_slave found
>+	 * will send all of this type of traffic.
> 	 */
>-	slave_no = bond->rr_tx_counter++ % bond->slave_cnt;
>+	if ((skb->protocol == htons(ETH_P_IP)) &&
>+	    (iph->protocol == htons(IPPROTO_IGMP))) {
>+		slave = bond->curr_active_slave;

	Technically, this should acquire bond->curr_slave_lock for read
around the inspection of curr_active_slave.

	I believe you'll also want a test for curr_active_slave == NULL,
and free the skb if so (or do something else).  There's a race window in
bond_release: when releasing the curr_active_slave, the field is left
momentarily NULL with the bond unlocked.  This occurs after the
bond_change_active_slave(bond, NULL) call, during the lock dance prior
to the call bond_select_active_slave:

bond_main.c:bond_release():
[...]
	if (oldcurrent == slave)
		bond_change_active_slave(bond, NULL);
[...]
	if (oldcurrent == slave) {
		/*
		 * Note that we hold RTNL over this sequence, so there
		 * is no concern that another slave add/remove event
		 * will interfere.
		 */
		write_unlock_bh(&bond->lock);

		[ race window is here ]

		read_lock(&bond->lock);
		write_lock_bh(&bond->curr_slave_lock);

		bond_select_active_slave(bond);

		write_unlock_bh(&bond->curr_slave_lock);
		read_unlock(&bond->lock);
		write_lock_bh(&bond->lock);
	}

	I'm reasonably sure the other TX functions (that need to) will
handle the case that curr_active_slave is NULL.

>+	} else {
>+		/*
>+		 * Concurrent TX may collide on rr_tx_counter; we accept
>+		 * that as being rare enough not to justify using an
>+		 * atomic op here.
>+		 */
>+		slave_no = bond->rr_tx_counter++ % bond->slave_cnt;
>
>-	bond_for_each_slave(bond, slave, i) {
>-		slave_no--;
>-		if (slave_no < 0)
>-			break;
>+		bond_for_each_slave(bond, slave, i) {
>+			slave_no--;
>+			if (slave_no < 0)
>+				break;
>+		}
> 	}
>
> 	start_at = slave;

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html