[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17785.1684882898@famine>
Date: Tue, 23 May 2023 16:01:38 -0700
From: Jay Vosburgh <jay.vosburgh@...onical.com>
To: Moviuro <moviuro@...il.com>
cc: netdev@...r.kernel.org
Subject: Re: Secondary bond slave receiving packets when preferred is up
Moviuro <moviuro@...il.com> wrote:
>Hi there,
>
>On 2 similar machines, some (random?) packets are received on a wireless
>bond slave when the preferred eth interface is connected: this causes
>local packet loss and at worst, disconnects (e.g. SSH and KDEConnect).
>
>My setup looks fine, inspired by the Arch wiki[0], see
>/proc/net/bonding/bond0 below. The archlinux community has not been able
>to help so far[1].
>
> +-----------+
> |Router .1 |
> +-----+-----+
> |
> +-----+-----+
> |Switch .30 +---------------+
> +--+--------+-------------+ |
> | | |
> | | |
> +------+--+ +-----------+ | |
> | WAP .21 +~~~~+Client .111+--+ |
> +------+--+ +-----------+ |
> | |
> | +-----------+ |
> +~~~~~~~+Client .149-----+
> +-----------+
>
>Running ping(8) for a few hours, there's nothing much going on, packet
>loss is really because ICMP packets end up on the WiFi interface:
>
>* .1 -> .149: 56436 sent, 56405 replies
>* .1 -> .111: 20643 sent, 20640 replies
>* .111 -> .149: 7682/7702 packets
>* .149 -> .111: 14791/14792 packets
>
>Sure enough, there's some noise on the WiFi interface:
>
>root@149 # tcpdump -ttttnei wlp3s0 host 192.168.1.149 and not arp
>2023-05-23 09:29:46.771535 11:11:11:11:11:74 > BB:BB:BB:BB:BB:33, ethertype IPv4 (0x0800), length 98: 192.168.1.1 > 192.168.1.149: ICMP echo request , id 64306, seq 53425, length 64
>2023-05-23 09:36:04.710859 bb:bb:bb:bb:bb:32 > BB:BB:BB:BB:BB:33, ethertype IPv4 (0x0800), length 98: 192.168.1.111 > 192.168.1.149: ICMP echo reque st, id 1, seq 2390, length 64
Some amount of random traffic arriving on the inactive interface
of an active-backup bond is expected; switches send traffic to such
places for various reasons. My initial guess would be that the switch's
forwarding entry for whatever BB:BB:BB:BB:BB:33 is expired, and the
switch flooded traffic for that destination to all ports. As an aside,
what is that MAC address? The last octet (33) doesn't appear in any of
the bond info dumps you list later for the .149 host.
In any event, an inactive bond interface will pass incoming
traffic in two cases:
1) its destination MAC address is in the link local reserved
range, 01:80:c2:00:00:0?, which is used for things like Spanning Tree or
LACP; the complete list can be found at
https://standards.ieee.org/products-programs/regauth/grpmac/public/
These should not be ARP or IP, and this is unlikely to be your
situation.
2) Something is bound directly to the bond interface itself via
a raw socket or the like; an example of this is LLDP, which needs to
exchange protocol frames at the interface level.
Even if the bond accepted some IP traffic on the inactive
interface and sent it up the stack, any reply should go back out the
active interface. This is based on the lack of failovers in the bond
status stuff, and presuming that the routing table on .111 and .149 is
what I'd expect (basically, a default route and subnet route for
192.168.1.0/24 that go through the bond only).
Some suggestions that might help:
1) Check rp_filter; if it's not enabled, then turn it on in
strict mode. This means insuring that the sysctls for .all, the bond
and its interfaces are all set to 1, e.g.,
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.bond0.rp_filter = 1
net.ipv4.conf.wlp5s0.rp_filter = 1
[... and so on ...]
Setting any of them to 2 will enable loose mode (the maximum
value between .all and the interface is what counts). Loose mode, or
rp_filter being off entirely, might be your problem if your routing is
not simple (e.g., you've got other IP networks that you didn't
describe). The docs for this can be found at
https://docs.kernel.org/networking/ip-sysctl.html
2) Enable the bonding option fail_over_mac = follow, this will
cause the MAC of the bond interfaces to not be all set to the same MAC.
If somehow the switch is getting confused by seeing the same MAC from
multiple ports, this may help.
-J
---
-Jay Vosburgh, jay.vosburgh@...onical.com
Powered by blists - more mailing lists