netdev - Re: ARP table question

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <49221F7A.8030706@candelatech.com>
Date:	Mon, 17 Nov 2008 17:50:50 -0800
From:	Ben Greear <greearb@...delatech.com>
To:	Rick Jones <rick.jones2@...com>
CC:	netdev@...r.kernel.org, Patrick McHardy <kaber@...sh.net>
Subject: Re: ARP table question

Rick Jones wrote:
> Ben Greear wrote:
>> Rick Jones wrote:
>>
>>>> +static unsigned long neigh_rand_retry(struct neighbour* neigh) {
>>>> +    if (neigh->parms->retrans_rand_backoff) {
>>>> +        return net_random() % neigh->parms->retrans_rand_backoff;
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>> +
>>>>  /* Called when a timer expires for a neighbour entry. */
>>>
>>>
>>> I thought that mod was something we tried to avoid?  Could you 
>>> instead use something that isn't random but perhaps varies among all 
>>> the requests?  Say some of the low-order bits of the IP being resolved?
>>
>>
>> This is only called when we are going to retransmit an ARP, which 
>> shouldn't
>> be in any sort of hot path, so I figured MOD was fine.
>>
>> The net_random is a very cheap method (last I checked), as well.
>>
>> So, I think that part is OK as it is, but I'm open to
>> persuasion :)
> 
> Perhaps I'm confused, or simply channeling Emily Litella again, but if 
> you only do this on the 1st through Nth retransmissions (ie after the 
> first retransmission timer has popped) don't you still have a thundering 
> herd problem on the first transmission and the first retransmission of 
> ARP requests?

You'd certainly have it on the first transmission, but I think from there on
the randomness should kick in.  This is a pretty rare case, and I'd rather
not slow down the initial ARP.  If we *are* in the overload situation, then
the network can just purge/drop/whatever the initial flood and then the
retransmits should start doing their random thing.  On my system, it still
takes maybe 30 seconds for all the ARPs to resolve since a good deal of
the requests and/or responses are being lost.

After some more testing, I can still get it into a bad
state if I have a retrans timer of 1 sec and a randomness of 5 secs
and manage to cause all 1000 arp entries to go stale at once (by
yanking a cable, for instance).

It seems I have to bump up the base timer to 3-5 seconds (I'm
leaving the random backoff at 5 secs as well).

Thanks,
Ben

-- 
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html