lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 07 Nov 2007 15:51:26 +0200
From:	Or Gerlitz <ogerlitz@...taire.com>
To:	Jay Vosburgh <fubar@...ibm.com>, Moni Shoua <monis@...taire.com>
CC:	Roland Dreier <rolandd@...co.com>, netdev@...r.kernel.org,
	Moni Levy <monil@...taire.com>
Subject: bonding / 2.6.24-rc1 issues

Jay, Moni

I did some tests with 2.6.24-rc1 and the first patch to bonding that Jay 
sent last night to netdev. Basic operation and fail over work fine. 
However, I see some crashes which are somehow related to destroying the 
bond when the slaves are ipoib ones, I don't see similar crashes when 
enslaving ethernet devices (Broadcom Corporation NetXtreme BCM5704 
Gigabit Ethernet (rev 03)), my compressed dot config is attached.

The first type of oops is when I just do modprobe -r bonding after 
enslavement of the ipoib devices:

> Ethernet Channel Bonding Driver: v3.2.1 (October 15, 2007)
> bonding: MII link monitoring set to 100 ms
> bonding: bond0: setting mode to active-backup (1).
> bonding: bond0: Setting MII monitoring interval to 100.
> NET: Registered protocol family 10
> ADDRCONF(NETDEV_UP): bond0: link is not ready
> bonding: bond0: doing slave updates when interface is down.
> bonding: bond0: Adding slave ib0.
> bonding bond0: master_dev is not up in bond_enslave
> bonding: bond0: Warning: enslaved VLAN challenged slave ib0. Adding VLANs will be blocked as long as ib0 is part of bond bond0
> bonding: bond0: enslaving ib0 as a backup interface with a down link.
> bonding: bond0: doing slave updates when interface is down.
> bonding: bond0: Adding slave ib1.
> bonding bond0: master_dev is not up in bond_enslave
> bonding: bond0: Warning: enslaved VLAN challenged slave ib1. Adding VLANs will be blocked as long as ib1 is part of bond bond0
> bonding: bond0: enslaving ib1 as a backup interface with a down link.
> ADDRCONF(NETDEV_UP): bond0: link is not ready
> bonding: bond0: link status definitely up for interface ib0.
> bonding: bond0: link status definitely up for interface ib1.
> bonding: bond0: making interface ib0 the new active one.
> bonding: bond0: first active interface up!
> ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> eth0: no IPv6 routers present
> bond0: no IPv6 routers present
> bonding: bond0: released all slaves
> Unable to handle kernel paging request at ffffffff880a07ce RIP: 
>  [<ffffffff880a07ce>]
> PGD 203067 PUD 207063 PMD 2060f067 PTE 0
> Oops: 0010 [1] SMP 
> CPU 0 
> Modules linked in: ib_ipoib ib_cm ib_sa ipv6 sg st sd_mod sr_mod scsi_mod e100 ib_mthca ib_mad ib_core i2c_amd8111 i2c_core
> Pid: 14604, comm: bond0 Not tainted 2.6.24-rc1 #1
> RIP: 0010:[<ffffffff880a07ce>]  [<ffffffff880a07ce>]
> RSP: 0018:ffff810008439e98  EFLAGS: 00010247
> RAX: ffff810004da20c0 RBX: ffff810004da20c0 RCX: ffff81000315aa68
> RDX: ffff810004da20c8 RSI: ffff810008439ef0 RDI: ffff81000315aa60
> RBP: ffffffff880a07ce R08: ffff810008438000 R09: ffff81000152d0d8
> R10: ffff810004da20c0 R11: ffff810009574000 R12: 00000000fffffffc
> R13: ffffffffffffffff R14: ffffffff8063b820 R15: 0000000000000000
> FS:  00002af0c528b0a0(0000) GS:ffffffff805d4000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: ffffffff880a07ce CR3: 000000002852f000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process bond0 (pid: 14604, threadinfo ffff810008438000, task ffff8100024970c0)
> Stack:  ffffffff802445c6 ffff810008439f08 ffff810004da20c0 ffffffff80244652
>  ffffffff8024473f 0000000000000000 ffff8100024970c0 ffffffff80248320
>  ffff810008439f08 ffff810008439f08 0000000000000000 0000000000000000
> Call Trace:
>  [<ffffffff802445c6>] run_workqueue+0x83/0x10f
>  [<ffffffff80244652>] worker_thread+0x0/0xf7
>  [<ffffffff8024473f>] worker_thread+0xed/0xf7
>  [<ffffffff80248320>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff80248320>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff80247fe6>] kthread+0x3d/0x63
>  [<ffffffff8020c4a8>] child_rip+0xa/0x12
>  [<ffffffff80247fa9>] kthread+0x0/0x63
>  [<ffffffff8020c49e>] child_rip+0x0/0x12
> 
> 
> Code:  Bad RIP value.
> RIP  [<ffffffff880a07ce>]
>  RSP <ffff810008439e98>
> CR2: ffffffff880a07ce

the second type of oops is when I modprobe -r ib_ipoib after 
enslavement. I was not able to test this one with ethernet as the tg3 
code is built into my kernel

> Nov  7 14:31:56 dill kernel: bonding: bond0: Setting MII monitoring interval to 100.
> Nov  7 14:31:56 dill kernel: bonding: bond0: Adding slave ib0.
> Nov  7 14:31:56 dill kernel: bonding: bond0: Warning: enslaved VLAN challenged slave ib0. Adding VLANs will be blocked as long as ib0 is part of bond bond0
> Nov  7 14:31:56 dill kernel: bonding: bond0: Warning: The first slave device specified does not support setting the MAC address. Enabling the fail_over_mac option.<6>bonding: bond0: enslaving ib0 as a backup interface with a down link.
> Nov  7 14:31:56 dill kernel: bonding: bond0: Adding slave ib1.
> Nov  7 14:31:56 dill kernel: bonding: bond0: Warning: enslaved VLAN challenged slave ib1. Adding VLANs will be blocked as long as ib1 is part of bond bond0
> Nov  7 14:31:56 dill kernel: bonding: bond0: enslaving ib1 as a backup interface with a down link.
> Nov  7 14:31:56 dill kernel: bonding: bond0: link status definitely up for interface ib0.
> Nov  7 14:31:56 dill kernel: bonding: bond0: making interface ib0 the new active one.
> Nov  7 14:31:56 dill kernel: bonding: bond0: first active interface up!
> Nov  7 14:31:56 dill kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> Nov  7 14:31:56 dill kernel: ib0: multicast join failed for 0001:ffff:ffff:0a0a:0081:ffff:c0bb:0a0a, status -22
> Nov  7 14:31:56 dill kernel: bonding: bond0: link status definitely up for interface ib1.
> Nov  7 14:31:58 dill kernel: ib0: multicast join failed for 0001:ffff:ffff:0a0a:0081:ffff:c0bb:0a0a, status -22
> Nov  7 14:32:02 dill kernel: ib0: multicast join failed for 0001:ffff:ffff:0a0a:0081:ffff:c0bb:0a0a, status -22
> Nov  7 14:32:07 dill kernel: bond0: no IPv6 routers present
> Nov  7 14:32:10 dill kernel: ib0: multicast join failed for 0001:ffff:ffff:0a0a:0081:ffff:c0bb:0a0a, status -22
> Nov  7 14:32:12 dill ypbind[14475]: broadcast: RPC: Timed out.
> Nov  7 14:32:18 dill kernel: ib0: cm send completion event with wrid 1073741823 (> 64)
> Nov  7 14:32:23 dill kernel: ib0: RX drain timing out
> Nov  7 14:32:23 dill kernel: bonding: bond0: Warning: the permanent HWaddr of ib0 - 80:06:04:04:fe:80 - is still in use by bond0. Set the HWaddr of ib0 to a different address to avoid conflicts.
> Nov  7 14:32:23 dill kernel: bonding: bond0: releasing active interface ib0
> Nov  7 14:32:23 dill kernel: bonding: bond0: making interface ib1 the new active one.
> Nov  7 14:32:23 dill kernel: ib1: multicast join failed for 0001:0000:ffff:0000:0000:0000:0070:5229, status -22
> Nov  7 14:32:23 dill kernel: bonding: bond0: releasing active interface ib1
> Nov  7 14:32:23 dill kernel: bonding: bond0: destroying bond bond0.
> Nov  7 14:32:23 dill kernel: __dev_addr_discard: address leakage! da_users=1
> Nov  7 14:32:23 dill kernel: Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: 
> Nov  7 14:32:23 dill kernel:  [<ffffffff802be76f>] sysfs_find_dirent+0x7/0x36
> Nov  7 14:32:23 dill kernel: PGD 250a067 PUD 40a4067 PMD 0 
> Nov  7 14:32:23 dill kernel: Oops: 0000 [1] SMP 
> Nov  7 14:32:23 dill kernel: CPU 1 
> Nov  7 14:32:23 dill kernel: Modules linked in: ib_ipoib ib_cm ib_sa bonding e100 ipv6 sg st sd_mod sr_mod scsi_mod ib_mthca ib_mad ib_core i2c_amd756 i2c_amd8111 i2c_core
> Nov  7 14:32:23 dill kernel: Pid: 18870, comm: modprobe Not tainted 2.6.24-rc1 #1
> Nov  7 14:32:23 dill kernel: RIP: 0010:[<ffffffff802be76f>]  [<ffffffff802be76f>] sysfs_find_dirent+0x7/0x36
> Nov  7 14:32:23 dill kernel: RSP: 0018:ffff8100264e3da8  EFLAGS: 00010246
> Nov  7 14:32:23 dill kernel: RAX: 0000000000000000 RBX: ffffffff88111959 RCX: 000000000000000a
> Nov  7 14:32:23 dill kernel: RDX: ffff8100264e3fd8 RSI: ffffffff88111959 RDI: 0000000000000000
> Nov  7 14:32:23 dill kernel: RBP: ffffffff88111959 R08: ffff810020705d70 R09: ffff810020012ae8
> Nov  7 14:32:23 dill kernel: R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000000
> Nov  7 14:32:23 dill kernel: R13: ffff810028d9e000 R14: 0000000000000006 R15: 0000000000515ab0
> Nov  7 14:32:23 dill kernel: FS:  00002adaba330720(0000) GS:ffff81002053dac0(0000) knlGS:0000000000000000
> Nov  7 14:32:23 dill kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Nov  7 14:32:23 dill kernel: CR2: 0000000000000028 CR3: 0000000001cca000 CR4: 00000000000006e0
> Nov  7 14:32:23 dill kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Nov  7 14:32:23 dill kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Nov  7 14:32:23 dill kernel: Process modprobe (pid: 18870, threadinfo ffff8100264e2000, task ffff8100264ed790)
> Nov  7 14:32:23 dill kernel: Stack:  0000000000000000 ffffffff88111959 ffffffff88117680 ffffffff802be87b
> Nov  7 14:32:23 dill kernel:  0000000000000006 0000000000000000 ffff810006578700 ffffffff802bfb83
> Nov  7 14:32:23 dill kernel:  ffff810020705d70 ffff810006578000 0000000000000000 ffffffff88107bd6
> Nov  7 14:32:24 dill kernel: Call Trace:
> Nov  7 14:32:24 dill kernel:  [<ffffffff802be87b>] sysfs_get_dirent+0x21/0x6c
> Nov  7 14:32:24 dill kernel:  [<ffffffff802bfb83>] sysfs_remove_group+0x1b/0x92
> Nov  7 14:32:24 dill kernel:  [<ffffffff88107bd6>] :bonding:bond_release_and_destroy+0x3d/0x44
> Nov  7 14:32:24 dill kernel:  [<ffffffff88107c92>] :bonding:bond_netdev_event+0xb5/0xca
> Nov  7 14:32:24 dill kernel:  [<ffffffff8046e55e>] notifier_call_chain+0x30/0x54
> Nov  7 14:32:24 dill kernel:  [<ffffffff8041845d>] unregister_netdevice+0xc3/0x15a
> Nov  7 14:32:24 dill kernel:  [<ffffffff80418505>] unregister_netdev+0x11/0x17
> Nov  7 14:32:24 dill kernel:  [<ffffffff880f2be4>] :ib_ipoib:ipoib_remove_one+0x64/0xa5
> Nov  7 14:32:24 dill kernel:  [<ffffffff88015069>] :ib_core:ib_unregister_client+0x43/0xfe
> Nov  7 14:32:24 dill kernel:  [<ffffffff880fb071>] :ib_ipoib:ipoib_cleanup_module+0xd/0x2b
> Nov  7 14:32:24 dill kernel:  [<ffffffff802557b1>] sys_delete_module+0x1b1/0x1e2
> Nov  7 14:32:24 dill kernel:  [<ffffffff80329b00>] __downgrade_write+0x5f/0xb1
> Nov  7 14:32:24 dill kernel:  [<ffffffff8026eb2e>] sys_munmap+0x4a/0x56
> Nov  7 14:32:24 dill kernel:  [<ffffffff8020b68e>] system_call+0x7e/0x83
> Nov  7 14:32:24 dill kernel: 
> Nov  7 14:32:24 dill kernel: 
> Nov  7 14:32:24 dill kernel: Code: 48 8b 5f 28 48 85 db 74 1c 48 8b 7b 18 48 89 ee e8 f6 b6 06 
> Nov  7 14:32:24 dill kernel: RIP  [<ffffffff802be76f>] sysfs_find_dirent+0x7/0x36
> Nov  7 14:32:24 dill kernel:  RSP <ffff8100264e3da8>
> Nov  7 14:32:24 dill kernel: CR2: 0000000000000028

the third type of oops is when I did some fail overs, then removed both 
slaves from the bond using
echo -$slave > /sys/class/net/bond0/bonding/slaves

> Ethernet Channel Bonding Driver: v3.2.1 (October 15, 2007)
> bonding: MII link monitoring set to 100 ms
> bonding: bond0: setting mode to active-backup (1).
> bonding: bond0: Setting MII monitoring interval to 100.
> ADDRCONF(NETDEV_UP): bond0: link is not ready
> bonding: bond0: doing slave updates when interface is down.
> bonding: bond0: Adding slave ib0.
> bonding bond0: master_dev is not up in bond_enslave
> bonding: bond0: Warning: enslaved VLAN challenged slave ib0. Adding VLANs will be blocked as long as ib0 is part of bond bond0
> bonding: bond0: enslaving ib0 as a backup interface with a down link.
> bonding: bond0: doing slave updates when interface is down.
> bonding: bond0: Adding slave ib1.
> bonding bond0: master_dev is not up in bond_enslave
> bonding: bond0: Warning: enslaved VLAN challenged slave ib1. Adding VLANs will be blocked as long as ib1 is part of bond bond0
> bonding: bond0: enslaving ib1 as a backup interface with a down link.
> ADDRCONF(NETDEV_UP): bond0: link is not ready
> bonding: bond0: link status definitely up for interface ib0.
> bonding: bond0: link status definitely up for interface ib1.
> bonding: bond0: making interface ib0 the new active one.
> bonding: bond0: first active interface up!
> ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> bond0: no IPv6 routers present
> bonding: bond0: link status definitely down for interface ib0, disabling it
> bonding: bond0: making interface ib1 the new active one.
> bonding: bond0: link status definitely up for interface ib0.
> bonding: bond0: link status definitely down for interface ib1, disabling it
> bonding: bond0: making interface ib0 the new active one.
> bonding: bond0: Removing slave ib0
> bonding: bond0: Warning: the permanent HWaddr of ib0 - 80:08:04:04:fe:80 - is still in use by bond0. Set the HWaddr of ib0 to a different address to avoid conflicts.
> bonding: bond0: releasing active interface ib0
> bonding: bond0: Removing slave ib1
> bonding: bond0: releasing backup interface ib1
> bonding: bond0: destroying bond bond0.
> Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: 
>  [<ffffffff802be76f>] sysfs_find_dirent+0x7/0x36
> PGD 48a0067 PUD 285f067 PMD 0 
> Oops: 0000 [1] SMP 
> CPU 1 
> Modules linked in: ib_ipoib ib_cm ib_sa bonding ipv6 sg st sd_mod sr_mod scsi_mod e100 ib_mthca ib_mad ib_core i2c_amd756 i2c_amd8111 i2c_core
> Pid: 16811, comm: bash Not tainted 2.6.24-rc1 #1
> RIP: 0010:[<ffffffff802be76f>]  [<ffffffff802be76f>] sysfs_find_dirent+0x7/0x36
> RSP: 0018:ffff8100049a5dd8  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffffff880ae959 RCX: 0000000000000002
> RDX: ffff8100049a5fd8 RSI: ffffffff880ae959 RDI: 0000000000000000
> RBP: ffffffff880ae959 R08: ffff8100205f5d70 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> R13: 0000000000000000 R14: ffff8100300c7000 R15: ffff8100049a5e69
> FS:  00002afe4e9870a0(0000) GS:ffff81002053dac0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000028 CR3: 000000000248f000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process bash (pid: 16811, threadinfo ffff8100049a4000, task ffff810001d91750)
> Stack:  0000000000000000 ffffffff880ae959 ffffffff880b4680 ffffffff802be87b
>  0000000000000006 0000000000000000 ffff81000e081700 ffffffff802bfb83
>  ffff8100205f5d70 ffff81000e081000 0000000000000000 ffffffff880a4bd6
> Call Trace:
>  [<ffffffff802be87b>] sysfs_get_dirent+0x21/0x6c
>  [<ffffffff802bfb83>] sysfs_remove_group+0x1b/0x92
>  [<ffffffff880a4bd6>] :bonding:bond_release_and_destroy+0x3d/0x44
>  [<ffffffff880aa685>] :bonding:bonding_store_slaves+0x29a/0x352
>  [<ffffffff8038a0c7>] dev_attr_store+0x1c/0x1e
>  [<ffffffff802be03d>] sysfs_write_file+0xca/0xfc
>  [<ffffffff802832fa>] vfs_write+0xae/0x130
>  [<ffffffff8028343b>] sys_write+0x45/0x6e
>  [<ffffffff8020b68e>] system_call+0x7e/0x83
> 
> 
> Code: 48 8b 5f 28 48 85 db 74 1c 48 8b 7b 18 48 89 ee e8 f6 b6 06 
> RIP  [<ffffffff802be76f>] sysfs_find_dirent+0x7/0x36
>  RSP <ffff8100049a5dd8>
> CR2: 0000000000000028


here's the script I use to set the bond  & do the enslavement
> #!/bin/bash
> 
> SLAVE_A=ib0
> SLAVE_B=ib1
> ADDR=192.168.10.118
> 
> #SLAVE_A=eth0
> #SLAVE_B=eth1
> #ADDR=172.30.10.6
> 
> /sbin/modprobe bonding
> 
> echo 1 > /sys/class/net/bond0/bonding/mode
> echo 100 > /sys/class/net/bond0/bonding/miimon
> 
> /sbin/modprobe ib_ipoib
> 
> echo +$SLAVE_A > /sys/class/net/bond0/bonding/slaves
> echo +$SLAVE_B > /sys/class/net/bond0/bonding/slaves
> 
> ifconfig bond0 $ADDR

Or.


Download attachment "config-2.6.24-rc1.bz2" of type "application/octet-stream" (7823 bytes)

Powered by blists - more mailing lists