lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 10 Feb 2012 17:53:53 -0800
From:	Jay Vosburgh <fubar@...ibm.com>
To:	Chris Friesen <chris.friesen@...band.com>
cc:	andy@...yhouse.net, netdev <netdev@...r.kernel.org>
Subject: Re: [BUG?] bonding, slave selection, carrier loss, etc.

Chris Friesen <chris.friesen@...band.com> wrote:

>I'm resurrecting an ancient discussion I had with Jay, because I think
>the issue described below is still present and the code he talked about
>submitting to close it doesn't appear to have ever gone in.

	Yah, I never got it to work quite right; I don't remember
exactly why.

>Basically in active/backup mode with mii monitoring there is a window
>between the active slave device losing carrier and calling
>netif_carrier_off() and the miimon code actually detecting the loss of
>the carrier and selecting a new active slave.
>
>The best solution would be for bonding to just register for notification
>of the link going down.  Presumably most drivers should be doing that
>properly by now, and for devices that get interrupt-driven notification
>of link status changes this would allow the bonding code to react much
>quicker.

	A quick look at some drivers shows that at least acenic still
doesn't do netif_carrier_off, so converting entirely to a notifier-based
failover mechanism would break drivers that work today.

	Adding a notifier callback as an additional path into something
like bond_miimon_commit may be feasible.

>Barring that, I think something like the following is needed.  This is
>against 2.6.27, but could easily be reworked against current.
>
>
>
>---------------------- drivers/net/bonding/bond_main.c -----------------------
>index 8499558..e4445d8 100644
>@@ -4313,20 +4313,33 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
> 	read_lock(&bond->lock);
> 	read_lock(&bond->curr_slave_lock);
>
> 	if (!BOND_IS_OK(bond)) {
> 		goto out;
> 	}
>
> 	if (!bond->curr_active_slave)
> 		goto out;
>
>+	/* Verify that the active slave is actually up before
>+	 * trying to send packets.  If it isn't, then
>+	 * trigger the selection of a new active slave.
>+	 */
>+	if (!IS_UP(bond->curr_active_slave->dev)) {
>+		read_unlock(&bond->curr_slave_lock);
>+		write_lock(&bond->curr_slave_lock);
>+		bond_select_active_slave(bond);
>+		write_unlock(&bond->curr_slave_lock);
>+		read_lock(&bond->curr_slave_lock);
>+		if (!bond->curr_active_slave)
>+			goto out;
>+	}

	The problem here is going to be that bond_select_active_slave()
should be called with RTNL held (because the notifier calls it makes
require RTNL), and I'm not sure it's permissible to acquire RTNL in a
driver transmit function.

	-J	


> 	res = bond_dev_queue_xmit(bond, skb, bond->curr_active_slave->dev);
>
> out:
> 	if (res) {
> 		/* no suitable interface, frame not sent */
> 		dev_kfree_skb(skb);
> 	}
> 	read_unlock(&bond->curr_slave_lock);
> 	read_unlock(&bond->lock);
> 	return 0;
>
>Chris
>
>
>
>
>On 03/27/2009 06:00 PM, Jay Vosburgh wrote:
>> Chris Friesen<cfriesen@...tel.com>  wrote:
>> 
>>> In a much earlier version of the bonding driver we ran into problems
>>> where we could have lost carrier on one of the slaves, but at the time
>>> of xmit the bonding driver hadn't yet switched to a better slave.
>>> Because of this we added a patch very much like the one below.
>>>
>>> A quick glance at the current bonding code would seem to indicate that
>>> there could still be a window between the active slave device losing
>>> carrier and calling netif_carrier_off() and the miimon code actually
>>> detecting the loss of the carrier and selecting a new active slave.
>>> Do I have this correct?  If so, would the patch below be correct?
>> 
>> 	Yes, the window is equal to whatever the monitoring interval is
>> (for miimon) or double the interval for ARP.
>> 
>> 	Your patch, I think, would work, but it's suboptimal in that it
>> only affects one mode, and doesn't resolve any of the bigger issues with
>> the link monitoring system in bonding (see below).  Trying to do the
>> equivalent in other modes may have issues; some modes require RTNL to be
>> held when changing slave states, so it's difficult to do that from the
>> transmit routine.
>> 
>>> On a related note--assuming the net driver can detect link loss and
>>> is properly calling netif_carrier_off() why do we still need to poll
>>> the status in the bonding driver?  Isn't there some way to hook into
>>> the network stack and get notified when the carrier goes down?
>> 
>> 	This is actually something I'm working on now.
>> 
>> 	There are notifier callbacks that are tied to a driver calling
>> netif_carrier_on or _off.  The problem is that a bunch of older (mostly
>> 10/100, although acenic is a gigabit) drivers don't do netif_carrier_on
>> or _off, or check their link state on a ridiculously long interval, so
>> simply dropping the current miimon implementation and replacing it with
>> the event notifier may not be feasible for backwards compatibility
>> reasons.  Heck, I've still got 3c59x and acenic cards in my test
>> systems, neither of which do netif_carrier correctly; I can't be the
>> only one.
>> 
>> 	An additional goal is to permit the state change notifications
>> (or miimon) and the ARP monitor to run concurrently.  Sadly, the current
>> "link state" system can't handle two things simultaneously poking at the
>> slave's link state; if, e.g., ARP says down, but MII/notifiers says up,
>> then the link state can flap, so it needs a sort of "arbitrator."
>> 
>> 	A minor advantage of reworking all of that is that it should end
>> up being less code when all done, and should be more modular, so it'd be
>> easier if somebody wanted to add, say, an ICMP probe monitor.
>> 
>> 	I'll probably be posting an RFC patch next week.
>> 
>> 	-J
>> 
>> ---
>> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
>> 
>
>
>-- 
>Chris Friesen
>Software Developer
>GENBAND
>chris.friesen@...band.com
>www.genband.com
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ