lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4921ECB5.6050503@cosmosbay.com>
Date:	Mon, 17 Nov 2008 23:14:13 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	David Miller <davem@...emloft.net>, rjw@...k.pl,
	linux-kernel@...r.kernel.org, kernel-testers@...r.kernel.org,
	cl@...ux-foundation.org, efault@....de, a.p.zijlstra@...llo.nl,
	Stephen Hemminger <shemminger@...tta.com>
Subject: Re: __inet_lookup_established(): Re: [Bug #11308] tbench regression
 on	each kernel release from 2.6.22 -&gt; 2.6.28

Ingo Molnar a écrit :
> * Ingo Molnar <mingo@...e.hu> wrote:
> 
>> 100.000000 total
>> ................
>>   1.673249 __inet_lookup_established
> 
>                       hits (total: 167324)
>                  .........
> ffffffff804b9b12:      446 <__inet_lookup_established>:
> ffffffff804b9b12:      446 	41 57                	push   %r15
> ffffffff804b9b14:     4810 	89 d0                	mov    %edx,%eax
> ffffffff804b9b16:        0 	0f b7 c9             	movzwl %cx,%ecx
> ffffffff804b9b19:        0 	41 56                	push   %r14
> ffffffff804b9b1b:      456 	41 55                	push   %r13
> ffffffff804b9b1d:        0 	41 54                	push   %r12
> ffffffff804b9b1f:        0 	55                   	push   %rbp
> ffffffff804b9b20:      427 	53                   	push   %rbx
> ffffffff804b9b21:        4 	48 89 f3             	mov    %rsi,%rbx
> ffffffff804b9b24:        2 	44 89 c6             	mov    %r8d,%esi
> ffffffff804b9b27:      504 	41 89 c8             	mov    %ecx,%r8d
> ffffffff804b9b2a:        1 	49 89 f7             	mov    %rsi,%r15
> ffffffff804b9b2d:        1 	48 83 ec 08          	sub    $0x8,%rsp
> ffffffff804b9b31:      462 	49 c1 e7 20          	shl    $0x20,%r15
> ffffffff804b9b35:        0 	48 89 3c 24          	mov    %rdi,(%rsp)
> ffffffff804b9b39:      507 	89 d7                	mov    %edx,%edi
> ffffffff804b9b3b:       38 	41 0f b7 d1          	movzwl %r9w,%edx
> ffffffff804b9b3f:        0 	41 89 d6             	mov    %edx,%r14d
> ffffffff804b9b42:      863 	49 09 c7             	or     %rax,%r15
> ffffffff804b9b45:       24 	41 c1 e6 10          	shl    $0x10,%r14d
> ffffffff804b9b49:        0 	41 09 ce             	or     %ecx,%r14d
> ffffffff804b9b4c:      479 	89 f9                	mov    %edi,%ecx
> ffffffff804b9b4e:        8 	48 8b 3c 24          	mov    (%rsp),%rdi
> ffffffff804b9b52:        0 	e8 cc f4 ff ff       	callq  ffffffff804b9023 <inet_ehashfn>
> ffffffff804b9b57:      413 	48 89 df             	mov    %rbx,%rdi
> ffffffff804b9b5a:      122 	41 89 c5             	mov    %eax,%r13d
> ffffffff804b9b5d:        0 	89 c6                	mov    %eax,%esi
> ffffffff804b9b5f:      635 	e8 3e f5 ff ff       	callq  ffffffff804b90a2 <inet_ehash_bucket>
> ffffffff804b9b64:      511 	48 89 c5             	mov    %rax,%rbp
> ffffffff804b9b67:        6 	44 89 e8             	mov    %r13d,%eax
> ffffffff804b9b6a:        0 	23 43 14             	and    0x14(%rbx),%eax
> ffffffff804b9b6d:      497 	4c 8d 24 85 00 00 00 	lea    0x0(,%rax,4),%r12
> ffffffff804b9b74:        0 	00 
> ffffffff804b9b75:        1 	4c 03 63 08          	add    0x8(%rbx),%r12
> ffffffff804b9b79:        0 	48 8b 45 00          	mov    0x0(%rbp),%rax
> ffffffff804b9b7d:      470 	0f 18 08             	prefetcht0 (%rax)
> ffffffff804b9b80:        0 	4c 89 e7             	mov    %r12,%rdi
> ffffffff804b9b83:     1089 	e8 32 cd 05 00       	callq  ffffffff805168ba <_read_lock>
> ffffffff804b9b88:     6752 	48 8b 55 00          	mov    0x0(%rbp),%rdx
> ffffffff804b9b8c:      598 	eb 2c                	jmp    ffffffff804b9bba <__inet_lookup_established+0xa8>
> ffffffff804b9b8e:      447 	48 81 3c 24 d0 15 ab 	cmpq   $0xffffffff80ab15d0,(%rsp)
> ffffffff804b9b95:        0 	80 
> ffffffff804b9b96:     1119 	75 1f                	jne    ffffffff804b9bb7 <__inet_lookup_established+0xa5>
> ffffffff804b9b98:       21 	4c 39 b8 30 02 00 00 	cmp    %r15,0x230(%rax)
> ffffffff804b9b9f:        0 	75 16                	jne    ffffffff804b9bb7 <__inet_lookup_established+0xa5>
> ffffffff804b9ba1:      492 	44 39 b0 38 02 00 00 	cmp    %r14d,0x238(%rax)
> ffffffff804b9ba8:        0 	75 0d                	jne    ffffffff804b9bb7 <__inet_lookup_established+0xa5>
> ffffffff804b9baa:        0 	8b 52 fc             	mov    -0x4(%rdx),%edx
> ffffffff804b9bad:      451 	85 d2                	test   %edx,%edx
> ffffffff804b9baf:        0 	74 67                	je     ffffffff804b9c18 <__inet_lookup_established+0x106>
> ffffffff804b9bb1:        0 	3b 54 24 40          	cmp    0x40(%rsp),%edx
> ffffffff804b9bb5:        0 	74 61                	je     ffffffff804b9c18 <__inet_lookup_established+0x106>
> ffffffff804b9bb7:        0 	48 89 ca             	mov    %rcx,%rdx
> ffffffff804b9bba:      402 	48 85 d2             	test   %rdx,%rdx
> ffffffff804b9bbd:     1006 	74 12                	je     ffffffff804b9bd1 <__inet_lookup_established+0xbf>
> ffffffff804b9bbf:        0 	48 8d 42 f8          	lea    -0x8(%rdx),%rax
> ffffffff804b9bc3:      821 	48 8b 0a             	mov    (%rdx),%rcx
> ffffffff804b9bc6:       78 	44 39 68 2c          	cmp    %r13d,0x2c(%rax)
> ffffffff804b9bca:        4 	0f 18 09             	prefetcht0 (%rcx)
> ffffffff804b9bcd:      685 	75 e8                	jne    ffffffff804b9bb7 <__inet_lookup_established+0xa5>
> ffffffff804b9bcf:   139502 	eb bd                	jmp    ffffffff804b9b8e <__inet_lookup_established+0x7c>
> ffffffff804b9bd1:        0 	48 8b 55 08          	mov    0x8(%rbp),%rdx
> ffffffff804b9bd5:        0 	eb 26                	jmp    ffffffff804b9bfd <__inet_lookup_established+0xeb>
> ffffffff804b9bd7:        0 	48 81 3c 24 d0 15 ab 	cmpq   $0xffffffff80ab15d0,(%rsp)
> ffffffff804b9bde:        0 	80 
> ffffffff804b9bdf:        0 	75 19                	jne    ffffffff804b9bfa <__inet_lookup_established+0xe8>
> ffffffff804b9be1:        0 	4c 39 78 40          	cmp    %r15,0x40(%rax)
> ffffffff804b9be5:        0 	75 13                	jne    ffffffff804b9bfa <__inet_lookup_established+0xe8>
> ffffffff804b9be7:        0 	44 39 70 48          	cmp    %r14d,0x48(%rax)
> ffffffff804b9beb:        0 	75 0d                	jne    ffffffff804b9bfa <__inet_lookup_established+0xe8>
> ffffffff804b9bed:        0 	8b 52 fc             	mov    -0x4(%rdx),%edx
> ffffffff804b9bf0:        0 	85 d2                	test   %edx,%edx
> ffffffff804b9bf2:        0 	74 24                	je     ffffffff804b9c18 <__inet_lookup_established+0x106>
> ffffffff804b9bf4:        0 	3b 54 24 40          	cmp    0x40(%rsp),%edx
> ffffffff804b9bf8:        0 	74 1e                	je     ffffffff804b9c18 <__inet_lookup_established+0x106>
> ffffffff804b9bfa:        0 	48 89 ca             	mov    %rcx,%rdx
> ffffffff804b9bfd:        0 	48 85 d2             	test   %rdx,%rdx
> ffffffff804b9c00:        0 	74 12                	je     ffffffff804b9c14 <__inet_lookup_established+0x102>
> ffffffff804b9c02:        0 	48 8d 42 f8          	lea    -0x8(%rdx),%rax
> ffffffff804b9c06:        0 	48 8b 0a             	mov    (%rdx),%rcx
> ffffffff804b9c09:        0 	44 39 68 2c          	cmp    %r13d,0x2c(%rax)
> ffffffff804b9c0d:        0 	0f 18 09             	prefetcht0 (%rcx)
> ffffffff804b9c10:        0 	75 e8                	jne    ffffffff804b9bfa <__inet_lookup_established+0xe8>
> ffffffff804b9c12:        0 	eb c3                	jmp    ffffffff804b9bd7 <__inet_lookup_established+0xc5>
> ffffffff804b9c14:        0 	31 c0                	xor    %eax,%eax
> ffffffff804b9c16:        0 	eb 04                	jmp    ffffffff804b9c1c <__inet_lookup_established+0x10a>
> ffffffff804b9c18:      441 	f0 ff 40 28          	lock incl 0x28(%rax)
> ffffffff804b9c1c:     1442 	f0 41 ff 04 24       	lock incl (%r12)
> ffffffff804b9c21:      476 	41 5b                	pop    %r11
> ffffffff804b9c23:        1 	5b                   	pop    %rbx
> ffffffff804b9c24:        0 	5d                   	pop    %rbp
> ffffffff804b9c25:      475 	41 5c                	pop    %r12
> ffffffff804b9c27:        0 	41 5d                	pop    %r13
> ffffffff804b9c29:        1 	41 5e                	pop    %r14
> ffffffff804b9c2b:      494 	41 5f                	pop    %r15
> ffffffff804b9c2d:        0 	c3                   	retq   
> ffffffff804b9c2e:        0 	90                   	nop    
> ffffffff804b9c2f:        0 	90                   	nop    
> 
> 80% of the overhead comes from cachemisses here:
> 
> ffffffff804b9bc6:       78 	44 39 68 2c          	cmp    %r13d,0x2c(%rax)
> ffffffff804b9bca:        4 	0f 18 09             	prefetcht0 (%rcx)
> ffffffff804b9bcd:      685 	75 e8                	jne    ffffffff804b9bb7 <__inet_lookup_established+0xa5>
> ffffffff804b9bcf:   139502 	eb bd                	jmp    ffffffff804b9b8e <__inet_lookup_established+0x7c>
> 
> corresponding to:
> 
> (gdb) list *0xffffffff804b9bc6
> 0xffffffff804b9bc6 is in __inet_lookup_established (net/ipv4/inet_hashtables.c:237).
> 232		rwlock_t *lock = inet_ehash_lockp(hashinfo, hash);
> 233	
> 234		prefetch(head->chain.first);
> 235		read_lock(lock);
> 236		sk_for_each(sk, node, &head->chain) {
> 237			if (INET_MATCH(sk, net, hash, acookie,
> 238						saddr, daddr, ports, dif))
> 239				goto hit; /* You sunk my battleship! */
> 240		}
> 241	
> 
> Seeing the first hard cachemiss on hash lookups is a familiar and 
> partly expected pattern - it is the first thing that touches 
> cache-cold data structures.
> 
> Seeing 1.4% of the totaly tbench overhead go into this single 
> cachemiss is a bit surprising to me though: tbench works via 
> long-lived connections (TCP establish costs and nowhere to be seen in 
> the profiles) so the socket hash should be relatively stable and 
> read-mostly on most CPUs in theory. The CPUs here have 2MB of L2 cache 
> per socket.
> 
> Could we be somehow dirtying these cachelines perhaps, causing 
> unnecessary cachemisses in hash lookups? Is the hash linkage portion 
> of the socket data structure frequently dirtied? Padding that to 64 
> bytes (or next to 64 bytes worth of read-mostly fields) could perhaps 
> give us a +1.7% tbench speedup.
> 

I am not seeing this of course on net-next-2.6 thanks to RCU

Could it be that several tbench sockets are hashed on same chain ?

tbench uses dst address and src address 127.0.0.1 for its sockets.
server binds on port 7003


static inline unsigned int inet_ehashfn(struct net *net,
                                        const __be32 laddr, const __u16 lport,
                                        const __be32 faddr, const __be16 fport)
{
        return jhash_3words((__force __u32) laddr,
                            (__force __u32) faddr,
                            ((__u32) lport) << 16 | (__force __u32)fport,
                            inet_ehash_secret + net_hash_mix(net));
}

Hum... should be OK, thanks to jhash.

Maybe same problem than eth_type_trans :

You have a cache line miss because the socket we handle in the chain was previously
handled by another cpu. (sk->refcnt being dirtied by this other cpu)


ffffffff804b9bc6:       78 	44 39 68 2c          	cmp    %r13d,0x2c(%rax)
ffffffff804b9bca:        4 	0f 18 09             	prefetcht0 (%rcx)

ffffffff804b9bcd:      685 	75 e8                	jne    ffffffff804b9bb7 <__inet_lookup_established+0xa5>
< "jne" stalls beccause CPU must bring to its cache 0x2c(%rax) to perform compare >

ffffffff804b9bcf:   139502 	eb bd                	jmp    ffffffff804b9b8e <__inet_lookup_established+0x7c>

Even if you padd/move refcnt somewhere else in sk, you'll need to take a reference on it,
so it wont help very much.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ