[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28766.1328925233@death.nxdomain>
Date: Fri, 10 Feb 2012 17:53:53 -0800
From: Jay Vosburgh <fubar@...ibm.com>
To: Chris Friesen <chris.friesen@...band.com>
cc: andy@...yhouse.net, netdev <netdev@...r.kernel.org>
Subject: Re: [BUG?] bonding, slave selection, carrier loss, etc.
Chris Friesen <chris.friesen@...band.com> wrote:
>I'm resurrecting an ancient discussion I had with Jay, because I think
>the issue described below is still present and the code he talked about
>submitting to close it doesn't appear to have ever gone in.
Yah, I never got it to work quite right; I don't remember
exactly why.
>Basically in active/backup mode with mii monitoring there is a window
>between the active slave device losing carrier and calling
>netif_carrier_off() and the miimon code actually detecting the loss of
>the carrier and selecting a new active slave.
>
>The best solution would be for bonding to just register for notification
>of the link going down. Presumably most drivers should be doing that
>properly by now, and for devices that get interrupt-driven notification
>of link status changes this would allow the bonding code to react much
>quicker.
A quick look at some drivers shows that at least acenic still
doesn't do netif_carrier_off, so converting entirely to a notifier-based
failover mechanism would break drivers that work today.
Adding a notifier callback as an additional path into something
like bond_miimon_commit may be feasible.
>Barring that, I think something like the following is needed. This is
>against 2.6.27, but could easily be reworked against current.
>
>
>
>---------------------- drivers/net/bonding/bond_main.c -----------------------
>index 8499558..e4445d8 100644
>@@ -4313,20 +4313,33 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
> read_lock(&bond->lock);
> read_lock(&bond->curr_slave_lock);
>
> if (!BOND_IS_OK(bond)) {
> goto out;
> }
>
> if (!bond->curr_active_slave)
> goto out;
>
>+ /* Verify that the active slave is actually up before
>+ * trying to send packets. If it isn't, then
>+ * trigger the selection of a new active slave.
>+ */
>+ if (!IS_UP(bond->curr_active_slave->dev)) {
>+ read_unlock(&bond->curr_slave_lock);
>+ write_lock(&bond->curr_slave_lock);
>+ bond_select_active_slave(bond);
>+ write_unlock(&bond->curr_slave_lock);
>+ read_lock(&bond->curr_slave_lock);
>+ if (!bond->curr_active_slave)
>+ goto out;
>+ }
The problem here is going to be that bond_select_active_slave()
should be called with RTNL held (because the notifier calls it makes
require RTNL), and I'm not sure it's permissible to acquire RTNL in a
driver transmit function.
-J
> res = bond_dev_queue_xmit(bond, skb, bond->curr_active_slave->dev);
>
> out:
> if (res) {
> /* no suitable interface, frame not sent */
> dev_kfree_skb(skb);
> }
> read_unlock(&bond->curr_slave_lock);
> read_unlock(&bond->lock);
> return 0;
>
>Chris
>
>
>
>
>On 03/27/2009 06:00 PM, Jay Vosburgh wrote:
>> Chris Friesen<cfriesen@...tel.com> wrote:
>>
>>> In a much earlier version of the bonding driver we ran into problems
>>> where we could have lost carrier on one of the slaves, but at the time
>>> of xmit the bonding driver hadn't yet switched to a better slave.
>>> Because of this we added a patch very much like the one below.
>>>
>>> A quick glance at the current bonding code would seem to indicate that
>>> there could still be a window between the active slave device losing
>>> carrier and calling netif_carrier_off() and the miimon code actually
>>> detecting the loss of the carrier and selecting a new active slave.
>>> Do I have this correct? If so, would the patch below be correct?
>>
>> Yes, the window is equal to whatever the monitoring interval is
>> (for miimon) or double the interval for ARP.
>>
>> Your patch, I think, would work, but it's suboptimal in that it
>> only affects one mode, and doesn't resolve any of the bigger issues with
>> the link monitoring system in bonding (see below). Trying to do the
>> equivalent in other modes may have issues; some modes require RTNL to be
>> held when changing slave states, so it's difficult to do that from the
>> transmit routine.
>>
>>> On a related note--assuming the net driver can detect link loss and
>>> is properly calling netif_carrier_off() why do we still need to poll
>>> the status in the bonding driver? Isn't there some way to hook into
>>> the network stack and get notified when the carrier goes down?
>>
>> This is actually something I'm working on now.
>>
>> There are notifier callbacks that are tied to a driver calling
>> netif_carrier_on or _off. The problem is that a bunch of older (mostly
>> 10/100, although acenic is a gigabit) drivers don't do netif_carrier_on
>> or _off, or check their link state on a ridiculously long interval, so
>> simply dropping the current miimon implementation and replacing it with
>> the event notifier may not be feasible for backwards compatibility
>> reasons. Heck, I've still got 3c59x and acenic cards in my test
>> systems, neither of which do netif_carrier correctly; I can't be the
>> only one.
>>
>> An additional goal is to permit the state change notifications
>> (or miimon) and the ARP monitor to run concurrently. Sadly, the current
>> "link state" system can't handle two things simultaneously poking at the
>> slave's link state; if, e.g., ARP says down, but MII/notifiers says up,
>> then the link state can flap, so it needs a sort of "arbitrator."
>>
>> A minor advantage of reworking all of that is that it should end
>> up being less code when all done, and should be more modular, so it'd be
>> easier if somebody wanted to add, say, an ICMP probe monitor.
>>
>> I'll probably be posting an RFC patch next week.
>>
>> -J
>>
>> ---
>> -Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
>>
>
>
>--
>Chris Friesen
>Software Developer
>GENBAND
>chris.friesen@...band.com
>www.genband.com
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists