[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49221F7A.8030706@candelatech.com>
Date: Mon, 17 Nov 2008 17:50:50 -0800
From: Ben Greear <greearb@...delatech.com>
To: Rick Jones <rick.jones2@...com>
CC: netdev@...r.kernel.org, Patrick McHardy <kaber@...sh.net>
Subject: Re: ARP table question
Rick Jones wrote:
> Ben Greear wrote:
>> Rick Jones wrote:
>>
>>>> +static unsigned long neigh_rand_retry(struct neighbour* neigh) {
>>>> + if (neigh->parms->retrans_rand_backoff) {
>>>> + return net_random() % neigh->parms->retrans_rand_backoff;
>>>> + }
>>>> + return 0;
>>>> +}
>>>> +
>>>> /* Called when a timer expires for a neighbour entry. */
>>>
>>>
>>> I thought that mod was something we tried to avoid? Could you
>>> instead use something that isn't random but perhaps varies among all
>>> the requests? Say some of the low-order bits of the IP being resolved?
>>
>>
>> This is only called when we are going to retransmit an ARP, which
>> shouldn't
>> be in any sort of hot path, so I figured MOD was fine.
>>
>> The net_random is a very cheap method (last I checked), as well.
>>
>> So, I think that part is OK as it is, but I'm open to
>> persuasion :)
>
> Perhaps I'm confused, or simply channeling Emily Litella again, but if
> you only do this on the 1st through Nth retransmissions (ie after the
> first retransmission timer has popped) don't you still have a thundering
> herd problem on the first transmission and the first retransmission of
> ARP requests?
You'd certainly have it on the first transmission, but I think from there on
the randomness should kick in. This is a pretty rare case, and I'd rather
not slow down the initial ARP. If we *are* in the overload situation, then
the network can just purge/drop/whatever the initial flood and then the
retransmits should start doing their random thing. On my system, it still
takes maybe 30 seconds for all the ARPs to resolve since a good deal of
the requests and/or responses are being lost.
After some more testing, I can still get it into a bad
state if I have a retrans timer of 1 sec and a randomness of 5 secs
and manage to cause all 1000 arp entries to go stale at once (by
yanking a cable, for instance).
It seems I have to bump up the base timer to 3-5 seconds (I'm
leaving the random backoff at 5 secs as well).
Thanks,
Ben
--
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc http://www.candelatech.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists