netdev - bonding device in balance-alb mode shows packet loss in kernel 3.2-rc6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <E31FB011129F30488D5861F38390491520C52BCC8B@BLRX7MCDC201.AMER.DELL.COM>
Date:	Tue, 27 Dec 2011 20:01:43 +0530
From:	<Narendra_K@...l.com>
To:	<netdev@...r.kernel.org>
CC:	<fubar@...ibm.com>
Subject: bonding device in balance-alb mode shows packet loss in kernel
 3.2-rc6

Hello,

On kernel version 3.2-rc6, when a bonding device is configured in 'balance-alb' mode,
ping reported packet losses. By looking at protocol trance, it seemed like the lost
packets had the destination MAC id of inactive slave. 

Scenario:

Host under test:

bond0 IP addr: 10.2.2.1 - balance-alb mode, 2 or more slaves.

Remote Host1: 10.2.2.11

Remote Host2: 10.2.2.2

Ping to Host 1 IP. Observe that there is no packet loss

# ping 10.2.2.11
PING 10.2.2.11 (10.2.2.11) 56(84) bytes of data.
64 bytes from 10.2.2.11: icmp_seq=1 ttl=64 time=0.156 ms
64 bytes from 10.2.2.11: icmp_seq=2 ttl=64 time=0.130 ms
64 bytes from 10.2.2.11: icmp_seq=3 ttl=64 time=0.151 ms
64 bytes from 10.2.2.11: icmp_seq=4 ttl=64 time=0.137 ms
64 bytes from 10.2.2.11: icmp_seq=5 ttl=64 time=0.151 ms
64 bytes from 10.2.2.11: icmp_seq=6 ttl=64 time=0.129 ms
^C
--- 10.2.2.11 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 4997ms
rtt min/avg/max/mdev = 0.129/0.142/0.156/0.014 ms

Now ping to Host2 IP. Observe that there is packet loss. It is reproducible almost
always.

# ping 10.2.2.2
PING 10.2.2.2 (10.2.2.2) 56(84) bytes of data.
64 bytes from 10.2.2.2: icmp_seq=6 ttl=64 time=0.108 ms
64 bytes from 10.2.2.2: icmp_seq=7 ttl=64 time=0.104 ms
64 bytes from 10.2.2.2: icmp_seq=8 ttl=64 time=0.119 ms
64 bytes from 10.2.2.2: icmp_seq=56 ttl=64 time=0.139 ms
64 bytes from 10.2.2.2: icmp_seq=57 ttl=64 time=0.111 ms
^C
--- 10.2.2.2 ping statistics ---
75 packets transmitted, 5 received, 93% packet loss, time 74037ms
rtt min/avg/max/mdev = 0.104/0.116/0.139/0.014 ms

More information:

Hardware information: 
Dell PowerEdge R610

# lspci | grep -i ether
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)

Kernel version:
3.2.0-rc6

# ethtool -i bond0
driver: bonding
version: 3.7.1

By observing the packets on remote HOST2, the sequence is

1. 'bond0' broadcasts an ARP request with source MAC equal to
'bond0' MAC address and receives a ARP response to the same.
Next few packets are received.

2. After some, there are 2 ARP replies from 'bond0' to HOST2
with source MAC equal to 'inactive slave' MAC id. Now HOST2 sends
ICMP response with destnation MAC equal to inactive slave MAC id
and these packets are dropped.

The wireshark protocol trace is attached to this note.

3. The behavior was independent of the Network adapters models.

4. Also, I had few prints in 'eth_type_trans' and it seemed like the 'inactive slave'
was not receiving any frames destined to it (00:21:9b:9d:a5:74) except ARP broadcasts.
Setting the 'inactive slave' in 'promisc' mode made bond0 see the responses.

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: em2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: em2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:21:9b:9d:a5:72
Slave queue ID: 0

Slave Interface: em3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:21:9b:9d:a5:74 <--- 1
Slave queue ID: 0

Slave Interface: em4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:21:9b:9d:a5:76
Slave queue ID: 0


# ip addr show dev bond0
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 00:21:9b:9d:a5:72 brd ff:ff:ff:ff:ff:ff
    inet 10.2.2.1/24 brd 10.2.2.255 scope global bond0
    inet6 fe80::221:9bff:fe9d:a572/64 scope link
       valid_lft forever preferred_lft forever

# ip addr show dev em2
3: em2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 00:21:9b:9d:a5:72 brd ff:ff:ff:ff:ff:ff

# ip addr show dev em3
4: em3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 00:21:9b:9d:a5:74 brd ff:ff:ff:ff:ff:ff

# ip addr show dev em4
5: em4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 00:21:9b:9d:a5:76 brd ff:ff:ff:ff:ff:ff

It would be great if you have any insight into this. Please let me know if any additional information is required.

With regards,
Narendra K



Download attachment "linux-3.2-rc6-balance-alb-protocol-trace" of type "application/octet-stream" (14230 bytes)