[<prev] [next>] [day] [month] [year] [list]
Message-ID: <f0b24f711285856425a0d155243045f2@jh-inst.cas.cz>
Date: Sat, 26 Dec 2020 00:41:41 +0100
From: Michal Tarana <michal.tarana@...inst.cas.cz>
To: vfalico@...il.com, andy@...yhouse.net
Cc: netdev@...r.kernel.org
Subject: Link aggregation between Linux server and Netgear switch using
802.3ad not working
Hi,
I am trying to make the 802.3ad link aggregation working between my new
Debian server and the switch Netgear ProSafe GSM7248V2. I see some
strange behavior, the Linux Kernel says:
bond0: Warning: No 802.3ad response from the link partner for any
adapters in the bond
The connection itself is alive, I see packets flowing through both
interfaces involved. However, the links do not aggregate. It means that
when I open two simultaneous connections between the server and two
other machines (on the same switch), the total transfer rate equals the
speed of a single network interface. There is no other factor in these
tests that would significantly reduce the speed (no HDD or any storage
involved).
I would be very thankful for any advice or help. I have used the link
aggregation in this mode many times before, even using the very same
switch (different NICs and kernel versions, though). Until now, I always
was able to configure it without any issues. I think I tried everything
I considered possible in this configuration, so my "last instance" is
the developer of this kernel driver. Please, if this is not an
appropriate place to ask for help, would you be so kind and forwarded my
message to the right place or recommended me where to ask for help?
Here are further details:
On the side of the switch:
=-=-=
(GSM7248V2) #show port 0/9
Admin Physical Physical Link Link LACP
Actor
Intf Type Mode Mode Status Status Trap Mode
Timeout
--------- ------ ------- ---------- ----------- ------ ------- -------
--------
0/9 PC Mbr Enable Auto 1000 Full Up Enable Enable
long
(GSM7248V2) #show port 0/10
Admin Physical Physical Link Link LACP
Actor
Intf Type Mode Mode Status Status Trap Mode
Timeout
--------- ------ ------- ---------- ----------- ------ ------- -------
--------
0/10 PC Mbr Enable Auto 1000 Full Up Enable Enable
long
(GSM7248V2) #show port-channel 3/3
Local Interface................................ 3/3
Channel Name................................... gstlag
Link State..................................... Up
Admin Mode..................................... Enabled
Type........................................... Dynamic
Load Balance Option............................ 6
(Src/Dest IP and TCP/UDP Port fields)
Mbr Device/ Port Port
Ports Timeout Speed Active
------ ------------- --------- -------
0/9 actor/long Auto True
partner/long
0/10 actor/long Auto True
partner/long
(GSM7248V2) #show lacp actor 0/9
Sys Admin Port Admin
Intf Priority Key Priority State
------ -------- ----- -------- -----------
0/9 1 56 128 ACT|AGG|LTO
(GSM7248V2) #show lacp actor 0/10
Sys Admin Port Admin
Intf Priority Key Priority State
------ -------- ----- -------- -----------
0/10 1 56 128 ACT|AGG|LTO
(GSM7248V2) #show lacp partner 0/9
Sys System Admin Prt Prt Admin
Intf Pri ID Key Pri Id State
------ --- ----------------- ----- --- ----- -----------
0/9 0 00:00:00:00:00:00 0 0 0 ACT|AGG|LTO
(GSM7248V2) #show lacp partner 0/10
Sys System Admin Prt Prt Admin
Intf Pri ID Key Pri Id State
------ --- ----------------- ----- --- ----- -----------
0/10 0 00:00:00:00:00:00 0 0 0 ACT|AGG|LTO
There are no VLANs or anything else configured. No port restrictions,
just the spanning-tree protocol is activated. There is one more LACP
port-channel (involving four different ports) configured on this switch
and connected to another device running Linux using ad802.3ad. That is
configured identically and does not have any issues.
On the side of the Linux server:
=-=-=-=-=
This is the output of the /proc/net/bonding/bond0:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 3000
Down Delay (ms): 3000
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: aa:aa:aa:aa:aa:88
Active Aggregator Info:
Aggregator ID: 7
Number of ports: 2
Actor Key: 9
Partner Key: 56
Partner Mac Address: bb:bb:bb:bb:bb:6a
Slave Interface: eno1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: aa:aa:aa:aa:aa:88
Slave queue ID: 0
Aggregator ID: 7
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: aa:aa:aa:aa:aa:88
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 1
system mac address: bb:bb:bb:bb:bb:6a
oper key: 56
port priority: 128
port number: 10
port state: 61
Slave Interface: eno2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: aa:aa:aa:aa:aa:89
Slave queue ID: 0
Aggregator ID: 7
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: aa:aa:aa:aa:aa:88
port key: 9
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 1
system mac address: bb:bb:bb:bb:bb:6a
oper key: 56
port priority: 128
port number: 9
port state: 61
As far as I can see, the information automatically gathered by the
bondind driver matches the configuration of the switch. Here are the
parameters passed to the bonding driver - along with the configuration
of the network interfaces:
auto bond0
iface bond0 inet static
address 192.168.2.15/24
gateway 192.168.2.1
dns-nameservers 8.8.8.8
dns-search fubar-domain.info
bond-slaves eno1 eno2
bond-mode 4
bond-miimon 100
bond-updelay 3000
bond-downdelay 3000
bond-lacp-rate 1
bond-xmit_hash_policy layer3+4
hwaddress aa:aa:aa:aa:aa:90
This is the corresponding output of ip a:
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq
master bond0 state UP group default qlen 1000
link/ether aa:aa:aa:aa:aa:90 brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq
master bond0 state UP group default qlen 1000
link/ether aa:aa:aa:aa:aa:90 brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP group default qlen 1000
link/ether aa:aa:aa:aa:aa:90 brd ff:ff:ff:ff:ff:ff
inet 192.168.2.15/24 brd 192.168.2.255 scope global bond0
valid_lft forever preferred_lft forever
inet6 fe80::ae1f:6bff:fedc:2e90/64 scope link
valid_lft forever preferred_lft forever
The switch shows that the max frame size is 1518.
Here is the relevant part of dmesg:
igb: loading out-of-tree module taints kernel.
igb: module verification failed: signature and/or required key missing -
tainting kernel
Intel(R) Gigabit Ethernet Linux Driver - version 5.4.6
Copyright(c) 2007 - 2020 Intel Corporation.
igb 0000:04:00.0: added PHC on eth0
igb 0000:04:00.0: Intel(R) Gigabit Ethernet Linux Driver
igb 0000:04:00.0: eth0: (PCIe:2.5GT/s:Width x1)
igb 0000:04:00.0 eth0: MAC: aa:aa:aa:aa:aa:88
igb 0000:04:00.0: eth0: PBA No: 012700-000
igb 0000:04:00.0: LRO is disabled
igb 0000:04:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
EDAC MC0: Giving out device to module skx_edac controller Skylake
Socket#0 IMC#0: DEV 0000:64:0a.0 (INTERRUPT)
EDAC MC1: Giving out device to module skx_edac controller Skylake
Socket#0 IMC#1: DEV 0000:64:0c.0 (INTERRUPT)
igb 0000:05:00.0: added PHC on eth1
igb 0000:05:00.0: Intel(R) Gigabit Ethernet Linux Driver
igb 0000:05:00.0: eth1: (PCIe:2.5GT/s:Width x1)
igb 0000:05:00.0 eth1: MAC: aa:aa:aa:aa:aa:89
igb 0000:05:00.0: eth1: PBA No: 012700-000
igb 0000:05:00.0: LRO is disabled
igb 0000:05:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
igb 0000:05:00.0 eno2: renamed from eth1
igb 0000:04:00.0 eno1: renamed from eth0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
bonding: bond0 is being created...
bond0: Enslaving eno1 as a backup interface with a down link
bond0: Enslaving eno2 as a backup interface with a down link
IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
igb 0000:04:00.0 eno1: igb: eno1 NIC Link is Up 1000 Mbps Full Duplex,
Flow Control: None
bond0: link status up for interface eno1, enabling it in 0 ms
bond0: link status definitely up for interface eno1, 1000 Mbps full
duplex
bond0: Warning: No 802.3ad response from the link partner for any
adapters in the bond
bond0: first active interface up!
IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
igb 0000:05:00.0 eno2: igb: eno2 NIC Link is Up 1000 Mbps Full Duplex,
Flow Control: None
bond0: link status up for interface eno2, enabling it in 3000 ms
bond0: invalid new link 3 on slave eno2
bond0: link status definitely up for interface eno2, 1000 Mbps full
duplex
Kernel version: Linux servername 4.19.0-13-amd64 #1 SMP Debian
4.19.160-2 (2020-11-28) x86_64 GNU/Linux
Version of the igb driver: 5.4.6
lspci of the Ethernet controllers:
04:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
Connection (rev 03)
Subsystem: Super Micro Computer Inc I210 Gigabit Network Connection
Flags: bus master, fast devsel, latency 0, IRQ 18, NUMA node 0
Memory at aa200000 (32-bit, non-prefetchable) [size=512K]
I/O ports at 2000 [size=32]
Memory at aa280000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number aa-aa-aa-aa-aa-aa-aa-88
Capabilities: [1a0] Transaction Processing Hints
Kernel driver in use: igb
Kernel modules: igb
05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
Connection (rev 03)
Subsystem: Super Micro Computer Inc I210 Gigabit Network Connection
Flags: bus master, fast devsel, latency 0, IRQ 19, NUMA node 0
Memory at aa100000 (32-bit, non-prefetchable) [size=512K]
I/O ports at 1000 [size=32]
Memory at aa180000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number aa-aa-aa-aa-aa-aa-aa-89
Capabilities: [1a0] Transaction Processing Hints
Kernel driver in use: igb
Kernel modules: igb
Note that I used an upstream version of the igb driver. I was thinking
that maybe this is some bug in that driver, as I found the gymnastic, it
performs with the ethernet device upon bonding initialization, a bit
unusual. However, the behavior of the upstream version was identical to
the behavior of the NIC driver included in this kernel. I also tried a
newer version of the Linux Kernel from Debain testing (5.9.15). The
behavior was identical to that described above.
I also tried to turn on the debugging mode of the bonding driver. Since
I do not have access to the details of the corresponding IEEE standard,
I could not make much out of it. However, I noticed that at the
initialization of the bonding interface, the NICs were joining and
leaving different groups according to the functions
ad_port_selection_logic and ad_agg_selection_logic in bond_ad3.c. The
first aggregate was always in the individual mode ( ->is_individual was
true). That was when the warning about no 802.3ad partner was issued.
Later, the interfaces joined the LAG group where no member was in an
individual mode. That was after the no-802.3ad-partner warning was
issued. Would that (rather lengthy) output be helpful to you the
assessment of this issue, please? If so, I can provide it.
Is there anything else that would be helpful to provide you with at this
point please? If so, do not hesitate to let me know.
Thank you very much for reading this rather lengthy report and for any
reply. With wishing of all the best,
Michal Tarana
--
Mgr. Michal Tarana, PhD
Department of Theoretical Chemistry
J Heyrovský Institute of Physical Chemistry
Academy of Sciences of Czech Republic
Dolejškova 2155/3
182 82 Prague 8
Czech Republic
Skype: tarana.michal
Powered by blists - more mailing lists