netdev - Re: Flooded with bonding: bond0: doing slave updates when interface is down.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bpfufmu1.fsf@tac.ki.iif.hu>
Date:	Fri, 12 Feb 2010 17:19:50 +0100
From:	Ferenc Wagner <wferi@...f.hu>
To:	Jay Vosburgh <fubar@...ibm.com>
Cc:	netdev@...r.kernel.org
Subject: Re: Flooded with bonding: bond0: doing slave updates when interface is down.

Jay Vosburgh <fubar@...ibm.com> writes:

> Ferenc Wagner <wferi@...f.hu> wrote:
>
>> On a system running Linux 2.6.32.7 I use the following initramfs script
>> to bring up some interfaces before mounting the root filesystem:
>>
>> [...]
>> modprobe bonding
>> cd /sys/class/net/$BOND/bonding
>> echo active-backup >mode
>> echo +eth0 >slaves
>> echo +eth1 >slaves
>> echo eth0 >primary
>> echo +10.0.0.1 >arp_ip_target
>> echo +10.0.0.2 >arp_ip_target
>> echo 1000 >arp_interval
>> [...]
>>
>> This stuff mostly works as expected, but sometimes I get this on the
>> console:
>>
>> [   27.792746] Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
>> [   27.831788] bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details.
>> [   27.935640] bonding: bond0: setting mode to active-backup (1).
>> [   27.970565] bonding: bond0: doing slave updates when interface is down.
>> [   28.010110] bonding: bond0: Adding slave eth0.
>> [   28.036651] bonding bond0: master_dev is not up in bond_enslave
>> [   28.137410] bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full.
>> [   28.198298] bonding: bond0: making interface eth0 the new active one.
>> [   28.236806] bonding: bond0: first active interface up!
>> [   28.267515] bonding: bond0: enslaving eth0 as an active interface with an up link.
>> [   28.312847] bonding: bond0: doing slave updates when interface is down.
>> [   28.352397] bonding: bond0: doing slave updates when interface is down.
>> [...]
>> [   29.065994] bonding: bond0: doing slave updates when interface is down.
>> [   29.105535] bonding: bond0: doing slave updates when interface is down.
>> [   29.145172] tg3: eth0: Link is up at 1000 Mbps, full duplex.
>> [   29.178990] tg3: eth0: Flow control is off for TX and off for RX.
>> [   29.215415] bonding: bond0: doing slave updates when interface is down.
>> [   29.254956] bonding: bond0: doing slave updates when interface is down.
>> [...]
>> [   78.660009] bonding: bond0: doing slave updates when interface is down.
>> [   78.699825] bonding: bond0: doing slave updates when interface is down.
>> [   78.739373] bonding: bond0: Adding slave eth1.
>> [   78.765914] bonding bond0: master_dev is not up in bond_enslave
>> [   78.817517] tg3 0000:05:01.1: firmware: requesting tigon/tg3_tso.bin
>> [   78.919759] bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full.
>> [   78.980658] bonding: bond0: enslaving eth1 as a backup interface with an up link.
>> [   79.025492] bonding: bond0: Setting eth0 as primary slave.
>> [   79.058351] bonding: bond0: adding ARP target 10.0.0.1.
>> [   79.089601] bonding: bond0: adding ARP target 10.0.0.2.
>> [   79.120855] bonding: bond0: Setting ARP monitoring interval to 1000.
>>
>> In the end, everything seems to be all right, but this occasional
>> interlude is disturbing and seems to indicate that something isn't quite
>> right.  Which may well be my abuse of the bonding driver, but then
>> please enlighten me, as I'd like to eliminate this 50-second delay from
>> the boot procedure.  I don't mind the couple of "doing slave updates
>> when interface is down" warnings which appear during the normal course
>> of actions, but the above is way too much in my opinion.
>
> The message itself means that you're adding a slave to bonding while
> the bond itself is down.  It's a warning only; the path through the
> function doesn't change when the warning is printed.

Why is the warning printed, btw?  What's wrong with adding a slave to a
bond which is down?

> I would hazard to guess that you're getting zillions of them because
> something is holding rtnl, and the bonding sysfs store function
> conditionally acquires rtnl after printing the warning.  If the rtnl
> acquisition fails, the system call is restarted, and you'll see the
> warning message again.  This rtnl_trylock/restart business is to
> prevent a deadlock during unregister.

Thanks for the explanation.  Is rtnl the shorthand of route netlink?

> I don't know why this repeats for 50-odd seconds, though.  Nothing
> should be holding rtnl for that long.

Forgive me a wild guess: what about the firmware loader?  That uses
sysfs at least, and maybe udev misses the request somehow...

> Do you still get the long delay if you set the bond up prior to adding
> the slaves?  Not necessarily assign an address, just set it
> administratively up (ip link set up dev bond0).

I rebooted a couple of times with this change, and had no problem at
all: no messages, no delay.  However, this issue wasn't reproducible, so
it does not mean too much.  I'll be running with this and yell if I
encounter any delay.  Meanwhile I also upgraded to 2.6.32.8, which I
forgot before running the tests.  I'll recheck with 2.6.32.7 as well.
-- 
Thanks,
Feri.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html