netdev - Re: [Bonding-devel] quick help with bonding?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4074.1175207458@death>
Date:	Thu, 29 Mar 2007 15:30:58 -0700
From:	Jay Vosburgh <fubar@...ibm.com>
To:	"Chris Friesen" <cfriesen@...tel.com>
cc:	Andy Gospodarek <andy@...yhouse.net>, netdev@...r.kernel.org,
	bonding-devel@...ts.sourceforge.net
Subject: Re: [Bonding-devel] quick help with bonding? 


Chris Friesen <cfriesen@...tel.com> wrote:
[...]
>I have a ppc64 blade running a customized 2.6.10.  At init time, two of
>our gigE links (eth4 and eth5) are bonded together to form bond0.  This
>link has an MTU of 9000, and uses arp monitoring.  We're using an ethernet
>driver with a modified RX path for jumbo frames[1].  With the stock
>driver, it seems to work fine.

	2.6.10 is pretty old, and there have been a number of fixes to
the bonding ARP monitor since then, so it may be that it is simply
misbehaving (presuming that you're running the 2.6.10 bonding driver).
Are you in a position to test against a more recent kernel (and/or
bonding driver)?  Does the miimon misbehave in a similar fashion?

>The problem is that eth5 seems to be bouncing up and down every 15 sec or
>so (see the attached log excerpt).  Also, "ifconfig" shows that only 3
>packets totalling 250 bytes have gone out eth5, when I know that the arp
>monitoring code from the bond layer is sending 10 arps/sec out the link.
[...]
>Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: scheduling interface eth4 to be reset in 30000 msec.
[...]
>Mar 29 20:54:08 base0-0-0-5-0-11-1 kernel: bonding: bond0: cancelled scheduled reset of interface eth5

	These two messages (which appear a number of times in your log
excerpt) are not from the standard mainline bonding driver, even in
2.6.10.  I don't know what this is all about.

>If I boot the system and then log in and manually create the bond link
>(rather than it happening at init time) then I don't see the problem.

	I would hazard to guess that it's an ARP monitor problem; older
versions of the ARP monitor had less than intelligent means to figure
out what the bond's IP address is (to use for the probes).  This, along
with some logic problems in the monitor code itself, led to various
problems with the ARP probes and the sort of "up / down" cycle of
behavior you seem to be seeing.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html