[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <460C453E.80208@mvista.com>
Date: Thu, 29 Mar 2007 16:01:18 -0700
From: Mark Huth <mhuth@...sta.com>
To: Jay Vosburgh <fubar@...ibm.com>
Cc: Chris Friesen <cfriesen@...tel.com>,
Andy Gospodarek <andy@...yhouse.net>, netdev@...r.kernel.org,
bonding-devel@...ts.sourceforge.net
Subject: Re: [Bonding-devel] quick help with bonding?
Jay Vosburgh wrote:
> Chris Friesen <cfriesen@...tel.com> wrote:
> [...]
>
>> I have a ppc64 blade running a customized 2.6.10. At init time, two of
>> our gigE links (eth4 and eth5) are bonded together to form bond0. This
>> link has an MTU of 9000, and uses arp monitoring. We're using an ethernet
>> driver with a modified RX path for jumbo frames[1]. With the stock
>> driver, it seems to work fine.
>>
>
> 2.6.10 is pretty old, and there have been a number of fixes to
> the bonding ARP monitor since then, so it may be that it is simply
> misbehaving (presuming that you're running the 2.6.10 bonding driver).
> Are you in a position to test against a more recent kernel (and/or
> bonding driver)? Does the miimon misbehave in a similar fashion?
>
>
>> The problem is that eth5 seems to be bouncing up and down every 15 sec or
>> so (see the attached log excerpt). Also, "ifconfig" shows that only 3
>> packets totalling 250 bytes have gone out eth5, when I know that the arp
>> monitoring code from the bond layer is sending 10 arps/sec out the link.
>>
> [...]
>
>> Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface eth4 to be reset in 30000 msec.
>>
> [...]
>
>> Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled reset of interface eth5
>>
>
> These two messages (which appear a number of times in your log
> excerpt) are not from the standard mainline bonding driver, even in
> 2.6.10. I don't know what this is all about.
>
>
>> If I boot the system and then log in and manually create the bond link
>> (rather than it happening at init time) then I don't see the problem.
>>
>
> I would hazard to guess that it's an ARP monitor problem; older
> versions of the ARP monitor had less than intelligent means to figure
> out what the bond's IP address is (to use for the probes). This, along
> with some logic problems in the monitor code itself, led to various
> problems with the ARP probes and the sort of "up / down" cycle of
> behavior you seem to be seeing.
>
> -J
>
> ---
> -Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
> -
>
I'll second what Jay said. I support a version of the 2.6.10 kernel
with bonding, and I needed to upgrade the bonding that was native to
2.6.10 to get reasonable behavior. You may also need a newer ifenslave.
It also looks like the mii interface is not well-behaved, because of the
initialization messages related to link speed.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists