[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16084.1235752119@death.nxdomain.ibm.com>
Date:	Fri, 27 Feb 2009 08:28:39 -0800
From:	Jay Vosburgh <fubar@...ibm.com>
To:	Jesper Krogh <jesper@...gh.cc>
cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Jeff Garzik <jgarzik@...hat.com>, aowi@...ozymes.com
Subject: Re: Regression in bonding between 2.6.26.8 and 2.6.27.6 - bisected
Jesper Krogh <jesper@...gh.cc> wrote:
[...]
>The offending commit seems to be:
>
>bonding: refactor mii monitor
>
>Refactor mii monitor.  As with the previous ARP monitor refactor,
>the motivation for this is to handle locking rationally (in this case,
>removing conditional locking) and generally clean up the code.
>
>This patch breaks up the monolithic mii monitor into two phases:
>an inspection phase, followed by an optional commit phase.  The commit phase
>is the only portion that requires RTNL or makes changes to state, and is
>only called when inspection finds something to change.
>
>Signed-off-by: Jay Vosburgh <fubar@...ibm.com>
>Signed-off-by: Jeff Garzik <jgarzik@...hat.com>
>
>
>A test with a fresh 2.6.29-rc6 revealed that the problem has been fixed
>subsequently.. but still exists in 2.6.27-newest.  (havent tested
>2.6.28-newest yet).
>
>Any ideas of what the "fixing" commit is .. or should that also be
>bisected?
	I went back and looked at your earlier mail.  Since you're using
802.3ad mode, my first guess would be this commit:
commit fd989c83325cb34795bc4d4aa6b13c06f90eac99
Author: Jay Vosburgh <fubar@...ibm.com>
Date:   Tue Nov 4 17:51:16 2008 -0800
    bonding: alternate agg selection policies for 802.3ad
    
        This patch implements alternative aggregator selection policies
    for 802.3ad.  The existing policy, now termed "stable," selects the active
    aggregator by greatest bandwidth, and only reselects a new aggregator
    if the active aggregator is entirely disabled (no more ports or all ports
    down).
    
        This patch adds two new policies: bandwidth and count, selecting
    the active aggregator by total bandwidth (like the stable policy) or by
    the number of ports in the aggregator, respectively.  These two policies
    also differ from the stable policy in that they will reselect the active
    aggregator when availability-related changes occur in the bond (e.g.,
    link state change).
    
        This permits "gang failover" within 802.3ad, allowing redundant
    aggregators along parallel paths to always maintain the "best" aggregator
    as the active aggregator (rather than having to wait for the active to
    entirely fail).
    
        This patch also updates the driver version to 3.5.0.
    
    Signed-off-by: Jay Vosburgh <fubar@...ibm.com>
    Signed-off-by: Jeff Garzik <jgarzik@...hat.com>
	This changed or refactored a great deal of the aggregator
selection logic, and it might be that it also fixed your problem by mere
happenstance.
	-J
---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
