[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200702201209.52388.dada1@cosmosbay.com>
Date: Tue, 20 Feb 2007 12:09:51 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Evgeniy Polyakov <johnpol@....mipt.ru>
Cc: akepner@....com, linux@...izon.com, davem@...emloft.net,
netdev@...r.kernel.org, bcrl@...ck.org
Subject: Re: Extensible hashing and RCU
On Tuesday 20 February 2007 11:44, Evgeniy Polyakov wrote:
> On Tue, Feb 20, 2007 at 11:04:15AM +0100, Eric Dumazet (dada1@...mosbay.com)
wrote:
> > You totally miss the fact that the 1-2-4 MB cache is not available for
> > you at all. It is filled by User accesses. I dont care about DOS. I care
> > about real servers, servicing tcp clients. The TCP service/stack should
> > not take more than 10% of CPU (cycles and caches). The User application
> > is certainly more important because it hosts the real added value.
>
> TCP socket is 4k in size, one tree entry can be reduced to 200 bytes?
>
> No one says about _that_ cache miss, it is considered OK to have, but
> tree cache miss becomes the worst thing ever.
> In softirq we process socket's state, lock, reference counter several
> pointer, and if we are happy - the whole TCP state machine fields - and
> most of it stasy there when kernel is over - userspace issues syscalls
> which must populate it back. Why don't we see that it is moved into
> cache each time syscall is invoked? Because it is in the cache as long
> as part of the hash table assotiated with last recently used hash
> entries, which should not be there, and instead part of the tree can be.
No I see cache misses everywhere...
This is because my machines are doing real work in user land. They are not lab
machines. Even if I had cpus with 16-32MB cache, it would be the same,
because User land wants GBs ...
For example, sock_wfree() uses 1.6612 % of cpu because of false sharing of
sk_flags (dirtied each time SOCK_QUEUE_SHRUNK is set :(
ffffffff803c2850 <sock_wfree>: /* sock_wfree total: 714241 1.6613 */
1307 0.0030 :ffffffff803c2850: push %rbp
55056 0.1281 :ffffffff803c2851: mov %rsp,%rbp
94 2.2e-04 :ffffffff803c2854: push %rbx
:ffffffff803c2855: sub $0x8,%rsp
1090 0.0025 :ffffffff803c2859: mov 0x10(%rdi),%rbx
3 7.0e-06 :ffffffff803c285d: mov 0xb8(%rdi),%eax
38 8.8e-05 :ffffffff803c2863: lock sub %eax,0x90(%rbx)
/* HOT : access to sk_flags */
81979 0.1907 :ffffffff803c286a: mov 0x100(%rbx),%eax
512119 1.1912 :ffffffff803c2870: test $0x2,%ah
262 6.1e-04 :ffffffff803c2873: jne ffffffff803c2880
<sock_wfree+0x30>
142 3.3e-04 :ffffffff803c2875: mov %rbx,%rdi
14467 0.0336 :ffffffff803c2878: callq *0x200(%rbx)
63 1.5e-04 :ffffffff803c287e: data16
:ffffffff803c287f: nop
9046 0.0210 :ffffffff803c2880: lock decl 0x28(%rbx)
29792 0.0693 :ffffffff803c2884: sete %al
56 1.3e-04 :ffffffff803c2887: test %al,%al
789 0.0018 :ffffffff803c2889: je ffffffff803c2893
<sock_wfree+0x43>
:ffffffff803c288b: mov %rbx,%rdi
144 3.3e-04 :ffffffff803c288e: callq ffffffff803c0f90 <sk_free>
1685 0.0039 :ffffffff803c2893: add $0x8,%rsp
2462 0.0057 :ffffffff803c2897: pop %rbx
684 0.0016 :ffffffff803c2898: leaveq
2963 0.0069 :ffffffff803c2899: retq
This is why tcp lookups should not take more than 1% themselves : other parts
of the stack *want* to make many cache misses too.
If we want to optimize tcp, we should reorder fields to reduce number of cache
lines, not change algos. struct sock fields are currently placed to reduce
holes, while they should be grouped by related fields sharing cache lines.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists