[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200702201104.16200.dada1@cosmosbay.com>
Date: Tue, 20 Feb 2007 11:04:15 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Evgeniy Polyakov <johnpol@....mipt.ru>
Cc: akepner@....com, linux@...izon.com, davem@...emloft.net,
netdev@...r.kernel.org, bcrl@...ck.org
Subject: Re: Extensible hashing and RCU
On Tuesday 20 February 2007 10:25, Evgeniy Polyakov wrote:
> On Mon, Feb 19, 2007 at 07:13:07PM +0100, Eric Dumazet (dada1@...mosbay.com)
wrote:
> > On Monday 19 February 2007 16:14, Eric Dumazet wrote:
> > > Because O(1) is different of O(log(N)) ?
> > > if N = 2^20, it certainly makes a difference.
> > > Yes, 1% of chains might have a length > 10, but yet 99% of the lookups
> > > are touching less than 4 cache lines.
> > > With a binary tree, log(2^20) is 20. or maybe not ? If you tell me it's
> > > 4, I will be very pleased.
> >
> > Here is the tcp ehash chain length distribution on a real server :
> > ehash_addr=0xffff810476000000
> > ehash_size=1048576
> > 333835 used chains, 3365 used twchains
> > Distribution of sockets/chain length
> > [chain length]:number of sockets
> > [1]:221019 37.4645%
> > [2]:56590 56.6495%
> > [3]:21250 67.4556%
> > [4]:12534 75.9541%
> > [5]:8677 83.3082%
> > [6]:5862 89.2701%
> > [7]:3640 93.5892%
> > [8]:2219 96.5983%
> > [9]:1083 98.2505%
> > [10]:539 99.1642%
> > [11]:244 99.6191%
> > [12]:112 99.8469%
> > [13]:39 99.9329%
> > [14]:16 99.9708%
> > [15]:6 99.9861%
> > [16]:3 99.9942%
> > [17]:2 100%
> > total : 589942 sockets
> >
> > So even with a lazy hash function, 89 % of lookups are satisfied with
> > less than 6 compares.
>
> This is only 2^19 sockets.
>
> What you miss is the fact, that upper layers of the tree are always in the
> cache. So its access cost nothing compared to every time new entry in
> the hash table.
>
> And there is _no_ O(1) - O(1) is just a hash entry selection, then you
> need to add the whole chain path, which can be long enough.
> For example for the length 9 you have 1000 entries - it is exactly size
> of the tree - sum of all entries upto and including 2^9.
> One third of the table is accessible within 17 lookups (hash chain
> length is 1), but given into account size of the entry - let's say it
> is equal to
> 32+32+32 - size of the key
> 32+32+32 - size of the left/right/parent pointer
> so 192 bytes per entry - given into acount that 1/4 of the 1-2 MB cache is
> filled with it, we get about 1300 top-level entries in the hash, so
> about _10_ first lookups are in the cache _always_ due to nature of the
> tree lookup.
You totally miss the fact that the 1-2-4 MB cache is not available for you at
all. It is filled by User accesses. I dont care about DOS. I care about real
servers, servicing tcp clients. The TCP service/stack should not take more
than 10% of CPU (cycles and caches). The User application is certainly more
important because it hosts the real added value.
As soon softirq finished its job, CPU returns to User Space, blowing out your
top-level entries from the cache. Next ms (tg3 for example, batching XX
packets per interrupt), softirq handler must repopulate the cache again and
again.
Is linux kernel aimed to construct firewalls ? Certainly not. Linux kernel is
a generalist kernel. Of course your tree algo might be better in certain
situations, but those situations are not the norm. In these situations, we
might reserve 50% of cpu cache to hold top level of route cache and tcp
cache, but provide no power to user programs.
With ehash, pressure on cpu caches is small and User application can make
progress.
Using a jenkin's hash permits a better hash distribution for a litle cpu cost.
I will post later a distribution simulation based on the data gathered from
the same real server.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists