[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a31-5f0efa80-3d-68593a00@242352203>
Date: Wed, 15 Jul 2020 14:46:06 +0200
From: pbl@...tov.io <pbl@...tov.io>
To: netdev@...r.kernel.org
Subject: Bonding driver unexpected behaviour
I'm attempting to set up the bonding driver on two gretap interfaces, gretap15 and gretap16
but I'm observing unexpected (to me) behaviour.
The underlying interfaces for those two are respectively intra15 (ipv4: 10.88.15.100/24) and
intra16 (ipv4: 10.88.16.100/24). These two are e1000 virtual network cards, connected through
virtual cables. As such, I would exclude any hardware issues. As a peer, I have another Linux
system configured similarly (ipv4s: 10.88.15.200 on intra15, 10.88.16.200 on intra16).
The gretap tunnels work as expected. They have the following ipv4 addresses:
host peer
gretap15 10.188.15.100 10.188.15.200
gretap16 10.188.16.100 10.188.16.200
When not enslaved by the bond interface, I'm able to exchange packets in the tunnel using the
internal ip addresses.
I then set up the bonding driver as follows:
# ip link add bond-15-16 type bond
# ip link set bond-15-16 type bond mode active-backup
# ip link set gretap15 down
# ip link set gretap16 down
# ip link set gretap15 master bond-15-16
# ip link set gretap16 master bond-15-16
# ip link set bond-15-16 mtu 1462
# ip addr add 10.42.42.100/24 dev bond-15-16
# ip link set bond-15-16 type bond arp_interval 100 arp_ip_target 10.42.42.200
# ip link set bond-15-16 up
I do the same on the peer system, inverting the interface and ARP target IP addresses.
At this point, IP communication using the addresses on the bond interfaces works as expected.
E.g.
# ping 10.24.24.200
gets responses from the other peer.
Using tcpdump on the other peer shows the GRE packets coming into intra15, and identical ICMP
packets coming through gretap15 and bond-15-16.
If I then disconnect the (virtual) network cable of intra15, the bonding driver switches to
intra16, as the GRE tunnel can no longer pass packets. However, despite having primary_reselect=0,
when I reconnect the network cable of intra15, the driver doesn't switch back to gretap15. In fact,
it doesn't even attempt sending any probes through it.
Fiddling with the cables (e.g. reconnecting intra15 and then disconnecting intra16) and/or bringing
the bond interface down and up usually results in the driver ping-ponging a bit between gretap15
and gretap16, before usually settling on gretap16 (but never on gretap15, it seems). Or,
sometimes, it results in the driver marking both slaves down and not doing anything ever again
until manual intervention (e.g. manually selecting a new active_slave, or down -> up).
Trying to ping the gretap15 address of the peer (10.188.15.200) from the host while gretap16 is the
active slave results in ARP traffic being temporarily exchanged on gretap15. I'm not sure whether
it originates from the bonding driver, as it seems like the generated requests are the cartesian
product of all address couples on the network segments of gretap15 and bond-15-16 (e.g. who-has
10.188.15.100 tell 10.188.15.100, who-has 10.188.15.100 tell 10.188.15.200, ..., who-hash
10.42.42.200 tell 10.42.42.200).
uname -a:
Linux fo-gw 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux
(same on peer system)
Am I misunderstanding how the driver works? Have I made any mistakes in the configuration?
Best regards,
Riccardo P. Bestetti
Powered by blists - more mailing lists