netdev - Re: bonding + arp monitoring fails if interface is a vlan

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAJk_L2HHVNg6CBWcTAD5cj_cyau9+m1yM1=yMZa0Rg1KuJD9Ew@mail.gmail.com>
Date:	Wed, 21 Aug 2013 09:39:07 +0200
From:	Santiago Garcia Mantinan <manty@...ty.net>
To:	Nikolay Aleksandrov <nikolay@...hat.com>
Cc:	netdev <netdev@...r.kernel.org>
Subject: Re: bonding + arp monitoring fails if interface is a vlan

Hi!

I think we have to clarify the setup...

>> iface bond0 inet static
>>         address 192.168.1.2
>>         netmask 255.255.255.0
>>         bond-slaves eth0.1001 eth0.1002 eth1.1001 eth1.1002
>>         bond-mode balance-xor
>>         bond-arp_validate 0
>>         bond-arp_interval 2000
>>         bond-arp_ip_target 192.168.1.1

> This setup works for me, what might be wrong with your setup is that you connect
> all 4 ports to a "dumb" switch,

What I have is three ports, not 4, I have two network cards on the
bonded machine and one on the not bonded machine, so I have three
ports. On the not bonded machine I configure the two vlan interfaces
over the same physical ethernet like this:
ifconfig eth2.1001 192.168.1.1
ifconfig eth2.1002 192.168.1.1

> and you have the same vlans over the real
> devices that are connected so they see each other's packets and the port's
> last_rx gets updated so they stay up.

I'd like to clarify this a bit, reading the bonding.txt file (the
howto) specially the arp_all_targets option (I haven't set this on my
setup) one would think that only a arp reply from at least one of the
specified targets had to be received in order for the link to be
considered on good state, not any traffic, specially if the traffic is
generated by your very own bonding driver. Isn't this like that?

What I'm trying to check on the real world scenario is if the gw,
which is on a remote location, is available, but I can have local
traffic that could be incrementing the counters.

> I tried your setup with a "smart" switch so the ports couldn't see each other
> and only the one that saw 192.168.1.1 was up, and the moment 192.168.1.1 went
> down - the port went down in the bonding.

I think that the problem here is not the "dumb" or "smart" switch. I
believe we are having different setups somehow. Please, if you don't
understand anything on my setup (two machines, one with the bonding
config I explained, and the other with one card and the ifconfig
commands I said up there) just let me know.

My first "dumb" switch was the switch of a soho adsl wifi router, then
I tried a soho "dumb" 8 ports 10/100 switch, then I tried an old
Cabletron SSR2000 where I had to define the two vlans on the three
ports and make these ports tagged ports, then I tried on a Enterasys
B5 (where I also had to specify that this ports had those vlans as
egress and tagged). On the smart machines the slaves were considered
to be down when vlans were not configured, as it was dropping all
traffic, but once the vlans were setup the slaves came up.

The behaviour I get is the same on "dumb" and "smart" switches, when I
have the eth2.1001 and 1002 interfaces up everything is like expected,
but then I run:
ifconfig eth2.1001 0.0.0.0 down
ifconfig eth2.1002 0.0.0.0 down
and the bonded machine still sees all the slaves up even though I can
see on the tcpdump I run on eth2 on the target machine that all 4
requests are arriving but none of them is being replied.

I have checked the counters you said and indeed they are being
increased, both in "dumb" and "smart" switches (note that I haven't
defined any bond on the switch side). I believe that either switch has
to forward what comes from eth0.1001 (connected to switch port X) to
eth1.1001 (connected to switch port Y) as they are broadcast messages
and I haven't defined any bonding, so he has to forward what comes on
port X to port Y, not doing so would break broadcast for a lot of
setups. What doesn't make sense to me is the assumption that
increasing counters when none of the specified targets are replying
means we have a good link.

I don't know what else to add to clarify what is going on, please, if
something is not clear ask me.

Thanks for your replies.

Regards.
-- 
Manty/BestiaTester -> http://manty.net
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html