netdev - Re: [PATCH 1/2] ipv4: Improve the scaling of the ARP cache for multicast destinations.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50410EB8.3040603@aristanetworks.com>
Date:	Fri, 31 Aug 2012 12:21:28 -0700
From:	Bob Gilligan <gilligan@...stanetworks.com>
To:	David Miller <davem@...emloft.net>
CC:	netdev@...r.kernel.org
Subject: Re: [PATCH 1/2] ipv4: Improve the scaling of the ARP cache for multicast
 destinations.

On 8/30/12 6:06 PM, David Miller wrote:
> From: Bob Gilligan <gilligan@...stanetworks.com>
> Date: Thu, 30 Aug 2012 17:55:04 -0700
> 
>> The mapping from multicast IPv4 address to MAC address can just as
>> easily be done at the time a packet is to be sent.  With this change,
>> we maintain one ARP cache entry for each interface that has at least
>> one multicast group member.  All routes to IPv4 multicast destinations
>> via a particular interface use the same ARP cache entry.  This entry
>> does not store the MAC address to use.  Instead, packets for multicast
>> destinations go to a new output function that maps the destination
>> IPv4 multicast address into the MAC address and forms the MAC header.
> 
> Doing an ARP MC mapping on every packet is much more expensive than
> doing a copy of the hard header cache.
> 
> I do not believe the memory consumption issue you use to justify this
> change is a real issue.
> 
> If you are talking to that many multicast groups actively, you do want
> that many neighbour cache entries.  This is not different from talking
> to nearly every IP address on a local /8 subnet.  You'll have a huge
> number of neighbour table entries in that case as well.
> 
> If your the actual steady state number of active groups being spoken
> to is smaller, you can tune the neighbour cache thresholds to collect
> old less used entries more quickly.
> 
> And this today is trivial, since routes no longer hold a reference
> to neighbour entries.  Therefore any neighbour entry whatsoever can
> be immediately reclaimed at any moment.

The scaling is N-squared: the number of neighbor cache entries
required for your multicast traffic is interfaces * groups.  100
interfaces and 100 groups could generate 10,000 entries. 1,000
interfaces and 1,000 groups could generate a million entries.

But the number of groups is hard to predict: it depends on the
applications in use and the multicast traffic they generate.  So, it
is hard to come up with a "budget" for multicast entries in the
neighbor cache for a multicast router.

If you pick a gc_thresh3 that is less than your working set, you'll
end up thrashing the neighbor cache.  And calls to neigh_forced_gc()
are expensive: It performs a linear search of the entire neighbor
cache.  Also, the calls to neigh_forced_gc() due to a large number of
multicast entries will negatively impact the unicast entries sharing the
neighbor cache: it will free any unreferenced but resolved unicast
entries. Any subsequent packets for those destinations will trigger a
re-ARP.  Unnecessary re-ARPing is generally undesirable in a router.

The user who wants to avoid these problems is left with the
alternative of setting gc_thresh3 to a very large number based on a
worst case estimate of the number of unicast plus multicast entries
required.

Seems just simpler and more efficient to keep the multicast entries
out of the neighbor cache entirely.

Bob.

> 
> I'm not fond of these patches, and adding yet more special cases to
> the neighbour layer, and therefore will not apply them.
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html