lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <f0b24f711285856425a0d155243045f2@jh-inst.cas.cz>
Date:   Sat, 26 Dec 2020 00:41:41 +0100
From:   Michal Tarana <michal.tarana@...inst.cas.cz>
To:     vfalico@...il.com, andy@...yhouse.net
Cc:     netdev@...r.kernel.org
Subject: Link aggregation between Linux server and Netgear switch using
 802.3ad not working

Hi,

I am trying to make the 802.3ad link aggregation working between my new 
Debian server and the switch Netgear ProSafe GSM7248V2. I see some 
strange behavior, the Linux Kernel says:

bond0: Warning: No 802.3ad response from the link partner for any 
adapters in the bond

The connection itself is alive, I see packets flowing through both 
interfaces involved. However, the links do not aggregate. It means that 
when I open two simultaneous connections between the server and two 
other machines (on the same switch), the total transfer rate equals the 
speed of a single network interface. There is no other factor in these 
tests that would significantly reduce the speed (no HDD or any storage 
involved).

I would be very thankful for any advice or help. I have used the link 
aggregation in this mode many times before, even using the very same 
switch (different NICs and kernel versions, though). Until now, I always 
was able to configure it without any issues. I think I tried everything 
I considered possible in this configuration, so my "last instance" is 
the developer of this kernel driver. Please, if this is not an 
appropriate place to ask for help, would you be so kind and forwarded my 
message to the right place or recommended me where to ask for help?


Here are further details:

On the side of the switch:
=-=-=
(GSM7248V2) #show port 0/9

                  Admin   Physical    Physical    Link   Link    LACP   
Actor
  Intf     Type    Mode    Mode        Status    Status  Trap    Mode   
Timeout
--------- ------ ------- ---------- ----------- ------ ------- ------- 
--------
0/9       PC Mbr Enable  Auto       1000 Full   Up     Enable  Enable  
long

(GSM7248V2) #show port 0/10

                  Admin   Physical    Physical    Link   Link    LACP   
Actor
  Intf     Type    Mode    Mode        Status    Status  Trap    Mode   
Timeout
--------- ------ ------- ---------- ----------- ------ ------- ------- 
--------
0/10      PC Mbr Enable  Auto       1000 Full   Up     Enable  Enable  
long

(GSM7248V2) #show port-channel 3/3


Local Interface................................ 3/3
Channel Name................................... gstlag
Link State..................................... Up
Admin Mode..................................... Enabled
Type........................................... Dynamic
Load Balance Option............................ 6
(Src/Dest IP and TCP/UDP Port fields)

Mbr    Device/       Port      Port
Ports  Timeout       Speed     Active
------ ------------- --------- -------
0/9    actor/long    Auto      True
        partner/long
0/10   actor/long    Auto      True
        partner/long

(GSM7248V2) #show lacp actor 0/9

          Sys    Admin   Port      Admin
  Intf  Priority  Key  Priority    State
------ -------- ----- -------- -----------
0/9    1        56    128      ACT|AGG|LTO

(GSM7248V2) #show lacp actor 0/10

          Sys    Admin   Port      Admin
  Intf  Priority  Key  Priority    State
------ -------- ----- -------- -----------
0/10   1        56    128      ACT|AGG|LTO

(GSM7248V2) #show lacp partner 0/9

        Sys      System       Admin Prt Prt     Admin
  Intf  Pri       ID          Key   Pri Id      State
------ --- ----------------- ----- --- ----- -----------
0/9    0   00:00:00:00:00:00 0     0   0     ACT|AGG|LTO

(GSM7248V2) #show lacp partner 0/10

        Sys      System       Admin Prt Prt     Admin
  Intf  Pri       ID          Key   Pri Id      State
------ --- ----------------- ----- --- ----- -----------
0/10   0   00:00:00:00:00:00 0     0   0     ACT|AGG|LTO

There are no VLANs or anything else configured. No port restrictions, 
just the spanning-tree protocol is activated. There is one more LACP 
port-channel (involving four different ports) configured on this switch 
and connected to another device running Linux using ad802.3ad. That is 
configured identically and does not have any issues.


On the side of the Linux server:
=-=-=-=-=
This is the output of the /proc/net/bonding/bond0:

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 3000
Down Delay (ms): 3000

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: aa:aa:aa:aa:aa:88
Active Aggregator Info:
	Aggregator ID: 7
	Number of ports: 2
	Actor Key: 9
	Partner Key: 56
	Partner Mac Address: bb:bb:bb:bb:bb:6a

Slave Interface: eno1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: aa:aa:aa:aa:aa:88
Slave queue ID: 0
Aggregator ID: 7
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
     system priority: 65535
     system mac address: aa:aa:aa:aa:aa:88
     port key: 9
     port priority: 255
     port number: 1
     port state: 63
details partner lacp pdu:
     system priority: 1
     system mac address: bb:bb:bb:bb:bb:6a
     oper key: 56
     port priority: 128
     port number: 10
     port state: 61

Slave Interface: eno2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: aa:aa:aa:aa:aa:89
Slave queue ID: 0
Aggregator ID: 7
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
     system priority: 65535
     system mac address: aa:aa:aa:aa:aa:88
     port key: 9
     port priority: 255
     port number: 2
     port state: 63
details partner lacp pdu:
     system priority: 1
     system mac address: bb:bb:bb:bb:bb:6a
     oper key: 56
     port priority: 128
     port number: 9
     port state: 61

As far as I can see, the information automatically gathered by the 
bondind driver matches the configuration of the switch. Here are the 
parameters passed to the bonding driver - along with the configuration 
of the network interfaces:

auto bond0
iface bond0 inet static
         address 192.168.2.15/24
         gateway 192.168.2.1
         dns-nameservers 8.8.8.8
         dns-search fubar-domain.info
         bond-slaves eno1 eno2
         bond-mode 4
         bond-miimon 100
         bond-updelay 3000
         bond-downdelay 3000
         bond-lacp-rate 1
         bond-xmit_hash_policy layer3+4
         hwaddress aa:aa:aa:aa:aa:90

This is the corresponding output of ip a:

2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq 
master bond0 state UP group default qlen 1000
     link/ether aa:aa:aa:aa:aa:90 brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq 
master bond0 state UP group default qlen 1000
     link/ether aa:aa:aa:aa:aa:90 brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP group default qlen 1000
     link/ether aa:aa:aa:aa:aa:90 brd ff:ff:ff:ff:ff:ff
     inet 192.168.2.15/24 brd 192.168.2.255 scope global bond0
        valid_lft forever preferred_lft forever
     inet6 fe80::ae1f:6bff:fedc:2e90/64 scope link
        valid_lft forever preferred_lft forever

The switch shows that the max frame size is 1518.


Here is the relevant part of dmesg:

igb: loading out-of-tree module taints kernel.
igb: module verification failed: signature and/or required key missing - 
tainting kernel
Intel(R) Gigabit Ethernet Linux Driver - version 5.4.6
Copyright(c) 2007 - 2020 Intel Corporation.
igb 0000:04:00.0: added PHC on eth0
igb 0000:04:00.0: Intel(R) Gigabit Ethernet Linux Driver
igb 0000:04:00.0: eth0: (PCIe:2.5GT/s:Width x1)
igb 0000:04:00.0 eth0: MAC: aa:aa:aa:aa:aa:88
igb 0000:04:00.0: eth0: PBA No: 012700-000
igb 0000:04:00.0: LRO is disabled
igb 0000:04:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
EDAC MC0: Giving out device to module skx_edac controller Skylake 
Socket#0 IMC#0: DEV 0000:64:0a.0 (INTERRUPT)
EDAC MC1: Giving out device to module skx_edac controller Skylake 
Socket#0 IMC#1: DEV 0000:64:0c.0 (INTERRUPT)
igb 0000:05:00.0: added PHC on eth1
igb 0000:05:00.0: Intel(R) Gigabit Ethernet Linux Driver
igb 0000:05:00.0: eth1: (PCIe:2.5GT/s:Width x1)
igb 0000:05:00.0 eth1: MAC: aa:aa:aa:aa:aa:89
igb 0000:05:00.0: eth1: PBA No: 012700-000
igb 0000:05:00.0: LRO is disabled
igb 0000:05:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
igb 0000:05:00.0 eno2: renamed from eth1
igb 0000:04:00.0 eno1: renamed from eth0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
bonding: bond0 is being created...
bond0: Enslaving eno1 as a backup interface with a down link
bond0: Enslaving eno2 as a backup interface with a down link
IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
igb 0000:04:00.0 eno1: igb: eno1 NIC Link is Up 1000 Mbps Full Duplex, 
Flow Control: None
bond0: link status up for interface eno1, enabling it in 0 ms
bond0: link status definitely up for interface eno1, 1000 Mbps full 
duplex
bond0: Warning: No 802.3ad response from the link partner for any 
adapters in the bond
bond0: first active interface up!
IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
igb 0000:05:00.0 eno2: igb: eno2 NIC Link is Up 1000 Mbps Full Duplex, 
Flow Control: None
bond0: link status up for interface eno2, enabling it in 3000 ms
bond0: invalid new link 3 on slave eno2
bond0: link status definitely up for interface eno2, 1000 Mbps full 
duplex

Kernel version: Linux servername 4.19.0-13-amd64 #1 SMP Debian 
4.19.160-2 (2020-11-28) x86_64 GNU/Linux
Version of the igb driver: 5.4.6

lspci of the Ethernet controllers:

04:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
Connection (rev 03)
	Subsystem: Super Micro Computer Inc I210 Gigabit Network Connection
	Flags: bus master, fast devsel, latency 0, IRQ 18, NUMA node 0
	Memory at aa200000 (32-bit, non-prefetchable) [size=512K]
	I/O ports at 2000 [size=32]
	Memory at aa280000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number aa-aa-aa-aa-aa-aa-aa-88
	Capabilities: [1a0] Transaction Processing Hints
	Kernel driver in use: igb
	Kernel modules: igb

05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
Connection (rev 03)
	Subsystem: Super Micro Computer Inc I210 Gigabit Network Connection
	Flags: bus master, fast devsel, latency 0, IRQ 19, NUMA node 0
	Memory at aa100000 (32-bit, non-prefetchable) [size=512K]
	I/O ports at 1000 [size=32]
	Memory at aa180000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number aa-aa-aa-aa-aa-aa-aa-89
	Capabilities: [1a0] Transaction Processing Hints
	Kernel driver in use: igb
	Kernel modules: igb

Note that I used an upstream version of the igb driver. I was thinking 
that maybe this is some bug in that driver, as I found the gymnastic, it 
performs with the ethernet device upon bonding initialization, a bit 
unusual. However, the behavior of the upstream version was identical to 
the behavior of the NIC driver included in this kernel. I also tried a 
newer version of the Linux Kernel from Debain testing (5.9.15). The 
behavior was identical to that described above.

I also tried to turn on the debugging mode of the bonding driver. Since 
I do not have access to the details of the corresponding IEEE standard, 
I could not make much out of it. However, I noticed that at the 
initialization of the bonding interface, the NICs were joining and 
leaving different groups according to the functions 
ad_port_selection_logic and ad_agg_selection_logic in bond_ad3.c. The 
first aggregate was always in the individual mode ( ->is_individual was 
true). That was when the warning about no 802.3ad partner was issued. 
Later, the interfaces joined the LAG group where no member was in an 
individual mode. That was after the no-802.3ad-partner warning was 
issued. Would that (rather lengthy) output be helpful to you the 
assessment of this issue, please? If so, I can provide it.

Is there anything else that would be helpful to provide you with at this 
point please? If so, do not hesitate to let me know.

Thank you very much for reading this rather lengthy report and for any 
reply. With wishing of all the best,
      Michal Tarana



-- 
Mgr. Michal Tarana, PhD

Department of Theoretical Chemistry

J Heyrovský Institute of Physical Chemistry
Academy of Sciences of Czech Republic
Dolejškova 2155/3
182 82 Prague 8
Czech Republic

Skype: tarana.michal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ