lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 02 Aug 2013 13:58:29 +0200
From:	Nikolay Aleksandrov <nikolay@...hat.com>
To:	Santiago Garcia Mantinan <manty@...ty.net>
CC:	netdev@...r.kernel.org
Subject: Re: bonding + arp monitoring fails if interface is a vlan

On 08/01/2013 02:11 PM, Santiago Garcia Mantinan wrote:
> Hi!
> 
> I'm trying to setup a bond of a couple of vlans, these vlans are different
> paths to an upstream switch from a local switch.  I want to do arp
> monitoring of the link in order for the bonding interface to know which path
> is ok and wich one is broken.  If I set it up using arp monitoring and
> without using vlans it works ok, it also works if I set it up using vlans
> but without arp monitoring, so the broken setup seems to be with bonding +
> arp monitoring + vlans. Here is a schema:
> 
>  -------------
> |Remote Switch|
>  -------------
>    |      |
>    P      P
>    A      A
>    T      T
>    H      H
>    1      2
>    |      |
>  ------------
> |Local switch|
>  ------------
>       |
>       | VLAN for PATH1
>       | VLAN for PATH2
>       |
>  Linux machine
> 
> The broken setup seems to work but arp monitoring makes it loose the logical
> link from time to time, thus changing to other slave if available.  What I
> saw when monitoring this with tcpdump is that all the arp requests were
> going out and that all the replies where coming in, so acording to the
> traffic seen on tcpdump the link should have been stable, but
> /proc/net/bonding/bond0 showed the link failures increasing and when testing
> with just a vlan interface I was loosing ping when the link was going down.
> 
> I've tried this on Debian wheezy with its 3.2.46 kernel and also the 3.10.3
> version in unstable, the tests where done on a couple of machines using a 32
> bits kernel with different nics (r8169 and skge).
> 
> I created a small lab to replicate the problem, on this setup I avoided all
> the switching and I directly connected the machine with bonding to another
> Linux on which I just had eth0.1002 configured with ip 192.168.1.1, the
> results where the same as in the full scenario, link on the bonding slave
> was going down from time to time.
> 
> This is the setup on the bonding interface.
> 
> auto bond0
> iface bond0 inet static
>         address 192.168.1.2
>         netmask 255.255.255.0
>         bond-slaves eth0.1002
>         bond-mode active-backup
>         bond-arp_validate 0
>         bond-arp_interval 5000
>         bond-arp_ip_target 192.168.1.1
>         pre-up ip link set eth0 up || true
>         pre-up ip link add link eth0 name eth0.1002 type vlan id 1002 || true
>         down ip link delete eth0.1002 || true
> 
I believe that it is because dev_trans_start() returns 0 for 8021q devices and
so the calculations if the slave has transmitted are wrong, and the flip-flop
happens.
Please try the attached patch, it should resolve your issue (basically it gets
the dev_trans_start of the vlan's underlying device if a vlan is found).

The patch is against Linus' tree.

Cheers,
 Nik



View attachment "bond-trans-start.patch" of type "text/x-patch" (1730 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ