[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50352BD0.3060409@genband.com>
Date: Wed, 22 Aug 2012 12:58:24 -0600
From: Chris Friesen <chris.friesen@...band.com>
To: Jay Vosburgh <fubar@...ibm.com>
CC: Jiri Bohac <jbohac@...e.cz>, Andy Gospodarek <andy@...yhouse.net>,
netdev@...r.kernel.org, Petr Tesarik <ptesarik@...e.cz>
Subject: Re: bonding: time limits too tight in bond_ab_arp_inspect
On 08/22/2012 12:42 PM, Jay Vosburgh wrote:
> Chris Friesen<chris.friesen@...band.com> wrote:
>
>> On 08/22/2012 11:45 AM, Jiri Bohac wrote:
>>
>>> This code is run from bond_activebackup_arp_mon() about
>>> delta_in_ticks jiffies after the previous ARP probe has been
>>> sent. If the delayed work gets executed exactly in delta_in_ticks
>>> jiffies, there is a chance the slave will be brought up. If the
>>> delayed work runs one jiffy later, the slave will stay down.
>
> Presumably the ARP reply is coming back in less than one jiffy,
> then, so the slave_last_rx() value is the same jiffy as when the
> _inspect was previously called?
>
>> <snip>
>>
>>> Should they perhaps all be increased by, say, delta_in_ticks/2, to make this
>>> less dependent on the current scheduling latencies?
>>
>> We have been using a patch that tracks the arpmon requested sleep time vs
>> the actual sleep time and adds any scheduling latency to the allowed
>> delta. That way if we sleep too long due to scheduling latency it doesn't
>> affect the calculation.
>
> How much scheduling latency do you see?
>
> Is that really better than just permitting a bit more slack in
> the timing window?
We hit enough latency that it triggered arpmon to falsely mark multiple
links as lost. This triggered our system maintenance code to go into a
"oh no we can't talk to the outside world" secenario, which does fairly
intrusive things to try and bring connectivity back up. Basically a bad
thing to happen just because of a random scheduler latency spike.
I should note that we added this some time back and are still running
older kernels so I have no idea what latency on modern kernels is like.
Chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists