[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <472DAD90.4050709@cosmosbay.com>
Date: Sun, 04 Nov 2007 12:31:28 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: David Miller <davem@...emloft.net>
CC: ak@...e.de, netdev@...r.kernel.org, acme@...hat.com,
Jarek Poplawski <jarkao2@...pl>
Subject: Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
David Miller a écrit :
> From: Andi Kleen <ak@...e.de>
> Date: Sun, 4 Nov 2007 00:18:14 +0100
>
>> On Thursday 01 November 2007 11:16:20 Eric Dumazet wrote:
>>
>> Some quick comments:
>>
>>> +#if defined(CONFIG_SMP) || defined(CONFIG_PROVE_LOCKING)
>>> +/*
>>> + * Instead of using one rwlock for each inet_ehash_bucket, we use a table of locks
>>> + * The size of this table is a power of two and depends on the number of CPUS.
>>> + */
>> This shouldn't be hard coded based on NR_CPUS, but be done on runtime
>> based on num_possible_cpus(). This is better for kernels with a large
>> NR_CPUS, but which typically run on much smaller systems (like
>> distribution kernels)
>
> I think this is a good idea. Eric, could you make this change?
Yes of course, since using a non constant value for masking is cheap.
But I suspect distributions kernels enable CONFIG_HOTPLUG_CPU so
num_possible_cpus() will be NR_CPUS.
>
>> Also the EHASH_LOCK_SZ == 0 special case is a little strange. Why did
>> you add that?
>
> He explained this in another reply, because ifdefs are ugly.
This will vanish if done on runtime anyway.
>
>> And as a unrelated node have you tried converting the rwlocks
>> into normal spinlocks? spinlocks should be somewhat cheaper
>> because they have less cache protocol overhead and with
>> the huge thash tables in Linux the chain walks should be short
>> anyways so not doing this in parallel is probably not a big issue.
>> At some point I also had a crazy idea of using a special locking
>> scheme that special cases the common case that a hash chain
>> has only one member and doesn't take a look for that at all.
>
> I agree.
>
> There was movement at one point to get rid of all rwlock's in the
> kernel, I personally think they are pointless. Any use that makes
> "sense" is a case where the code should be rewritten to decrease the
> lock hold time or convert to RCU.
>
I agree too, rwlocks are more expensive when contention is low, so let do this
rwlock->spinlock change on next step (separate patch), because it means
changing also lhash_lock.
Thanks to Jarek, I added locks cleanup in dccp_fini()
[PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table
As done two years ago on IP route cache table (commit
22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per
hash bucket for the huge TCP/DCCP hash tables.
On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle
performance differences. (we hit a different cache line for the rwlock, but
then the bucket cache line have a better sharing factor among cpus, since we
dirty it less often). For netstat or ss commands that want a full scan of hash
table, we perform fewer memory accesses.
Using a 'small' table of hashed rwlocks should be more than enough to provide
correct SMP concurrency between different buckets, without using too much
memory. Sizing of this table depends on num_possible_cpus() and various CONFIG
settings.
This patch provides some locking abstraction that may ease a future work using
a different model for TCP/DCCP table.
Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@...hat.com>
include/net/inet_hashtables.h | 71 +++++++++++++++++++++++++++++---
net/dccp/proto.c | 9 +++-
net/ipv4/inet_diag.c | 9 ++--
net/ipv4/inet_hashtables.c | 7 +--
net/ipv4/inet_timewait_sock.c | 13 +++--
net/ipv4/tcp.c | 4 -
net/ipv4/tcp_ipv4.c | 11 ++--
net/ipv6/inet6_hashtables.c | 19 ++++----
8 files changed, 106 insertions(+), 37 deletions(-)
View attachment "tcp_ehash_locks.patch" of type "text/plain" (14143 bytes)
Powered by blists - more mailing lists