lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17860.1385068379@death.nxdomain>
Date:	Thu, 21 Nov 2013 13:12:59 -0800
From:	Jay Vosburgh <fubar@...ibm.com>
To:	rama nichanamatlu <rama.nichanamatlu@...cle.com>
cc:	Veaceslav Falico <vfalico@...hat.com>, netdev@...r.kernel.org
Subject: Re: [PATCH] bonding: If IP route look-up to send an ARP fails, mark in bonding structure as no ARP sent.

rama nichanamatlu <rama.nichanamatlu@...cle.com> wrote:

>On 11/21/2013 3:10 AM, Veaceslav Falico wrote:
>> On Wed, Nov 20, 2013 at 04:53:20PM -0800, rama nichanamatlu wrote:
>>> During the creation of VLAN's atop bonding the underlying interfaces
>>> are made part of VLAN's, and at the same bonding driver gets aware
>>> that VLAN's exists above it and hence would consult IP routing for
>>> every ARP to  be sent to determine the route which tells bonding
>>> driver the correct VLAN tag to attach to the outgoing ARP packet. But,
>>> during the VLAN creation when vlan driver puts the underlying
>>> interface into default vlan and then actual vlan, in-between this if
>>> bonding driver consults the IP for a route, IP fails to provide a
>>> correct route and upon which bonding driver drops the ARP packet. ARP
>>> monitor when it
>>> comes around next time, sees no ARP response and fails-over to the
>>> next available slave. Consulting for a IP route,
>>> ip_route_output(),happens in bond_arp_send_all().
>> 
>> bonding works as expected - nothing to fix here. And even as a
>> workaround/hack - I'm not sure we need that to suppress one failover *only*
>> when vlan is added on top.
>> 
>>>
>Thank U.
>With *out* this change our systems failed system testing, to
>consistently be on designated primary interface on *every* single
>reboot. With this change the behavior was as expected even after a few
>thousand reboots & System testing could move to next level catching an
>another bug in sr-iov :). And Without, the outcome was less predictable
>after a reboot and bonding was on a different slave each time.
>-Rama

	By "designated primary" you mean the bonding primary option,
correct?  If not, does setting primary resolve the problem?  If so,
you're saying that during the bringup, bonding would end up with a
non-primary slave as the active slave?  Or that there would be a
failover / failback cycle during the bringup due to the lack of VLAN
availability?

	There is already a mechanism in bond_ab_arp_inspect() to give
new slaves a grace period before applying link failures:

                /*
                 * Give slaves 2*delta after being enslaved or made
                 * active.  This avoids bouncing, as the last receive
                 * times need a full ARP monitor cycle to be updated.
                 */
                if (bond_time_in_interval(bond, slave->jiffies, 2))
                        continue;

	If you extend that grace period (the "2", which is in units of
the arp_interval), does the problem resolve itself, or is the time
window here longer than that?

	How is the configuration of bonding and the VLANs taking place?

	I don't think this patch is suitable (because it can mask
legitimate failures), but I'm not entirely sure I understand the details
of the problem.  Is this simply that the arp_ip_target is specified as a
VLAN destination signficantly before (meaning perhaps many seconds of
real time) the VLAN is configured above bonding, or is it some kind of
race condition in the VLAN code?

	-J

>>> To prevent this false fail-over, when bonding driver fails to send an
>>> ARP out it marks in its private structure, bonding{},  not to expect
>>> an ARP response, when ARP monitor comes around next time ARP sending
>>> will be tried again.
>>>
>>> Extensively tested in a VM environment; sr-iov intf->bonding
>>> intf->vlan intf. All virtual interfaces created at boot time.

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ