linux-kernel - Re: [PATCH] SLUB use cmpxchg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070822011225.GA4124@Krystal>
Date:	Tue, 21 Aug 2007 21:12:25 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Christoph Lameter <clameter@....com>
Cc:	Andi Kleen <andi@...stfloor.org>, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org, mingo@...hat.com
Subject: Re: [PATCH] SLUB use cmpxchg_local

* Christoph Lameter (clameter@....com) wrote:
> Ok. Measurements vs. simple cmpxchg on a Intel(R) Pentium(R) 4 CPU 3.20GHz 
> (hyperthreading enabled). Test run with your module show only minor 
> performance improvements and lots of regressions. So we must have 
> cmpxchg_local to see any improvements? Some kind of a recent optimization 
> of cmpxchg performance that we do not see on older cpus?
> 

I did not expect the cmpxchg with LOCK prefix to be faster than irq
save/restore. You will need to run these tests using cmpxchg_local to
see an improvement.

Mathieu

> 
> Code of kmem_cache_alloc (to show you that there are no debug options on):
> 
> Dump of assembler code for function kmem_cache_alloc:
> 0x4015cfa9 <kmem_cache_alloc+0>:        push   %ebp
> 0x4015cfaa <kmem_cache_alloc+1>:        mov    %esp,%ebp
> 0x4015cfac <kmem_cache_alloc+3>:        push   %edi
> 0x4015cfad <kmem_cache_alloc+4>:        push   %esi
> 0x4015cfae <kmem_cache_alloc+5>:        push   %ebx
> 0x4015cfaf <kmem_cache_alloc+6>:        sub    $0x10,%esp
> 0x4015cfb2 <kmem_cache_alloc+9>:        mov    %eax,%esi
> 0x4015cfb4 <kmem_cache_alloc+11>:       mov    %edx,0xffffffe8(%ebp)
> 0x4015cfb7 <kmem_cache_alloc+14>:       mov    0x4(%ebp),%eax
> 0x4015cfba <kmem_cache_alloc+17>:       mov    %eax,0xfffffff0(%ebp)
> 0x4015cfbd <kmem_cache_alloc+20>:       mov    %fs:0x404af008,%eax
> 0x4015cfc3 <kmem_cache_alloc+26>:       mov    0x90(%esi,%eax,4),%edi
> 0x4015cfca <kmem_cache_alloc+33>:       mov    (%edi),%ecx
> 0x4015cfcc <kmem_cache_alloc+35>:       test   %ecx,%ecx
> 0x4015cfce <kmem_cache_alloc+37>:       je     0x4015d00a <kmem_cache_alloc+97>
> 0x4015cfd0 <kmem_cache_alloc+39>:       mov    0xc(%edi),%eax
> 0x4015cfd3 <kmem_cache_alloc+42>:       mov    (%ecx,%eax,4),%eax
> 0x4015cfd6 <kmem_cache_alloc+45>:       mov    %eax,%edx
> 0x4015cfd8 <kmem_cache_alloc+47>:       mov    %ecx,%eax
> 0x4015cfda <kmem_cache_alloc+49>:       lock cmpxchg %edx,(%edi)
> 0x4015cfde <kmem_cache_alloc+53>:       mov    %eax,%ebx
> 0x4015cfe0 <kmem_cache_alloc+55>:       cmp    %ecx,%eax
> 0x4015cfe2 <kmem_cache_alloc+57>:       jne    0x4015cfbd <kmem_cache_alloc+20>
> 0x4015cfe4 <kmem_cache_alloc+59>:       cmpw   $0x0,0xffffffe8(%ebp)
> 0x4015cfe9 <kmem_cache_alloc+64>:       jns    0x4015d006 <kmem_cache_alloc+93>
> 0x4015cfeb <kmem_cache_alloc+66>:       mov    0x10(%edi),%edx
> 0x4015cfee <kmem_cache_alloc+69>:       xor    %eax,%eax
> 0x4015cff0 <kmem_cache_alloc+71>:       mov    %edx,%ecx
> 0x4015cff2 <kmem_cache_alloc+73>:       shr    $0x2,%ecx
> 0x4015cff5 <kmem_cache_alloc+76>:       mov    %ebx,%edi
> 
> Base
> 
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 332 cycles kfree -> 422 cycles
> 10000 times kmalloc(16) -> 218 cycles kfree -> 360 cycles
> 10000 times kmalloc(32) -> 214 cycles kfree -> 368 cycles
> 10000 times kmalloc(64) -> 244 cycles kfree -> 390 cycles
> 10000 times kmalloc(128) -> 320 cycles kfree -> 417 cycles
> 10000 times kmalloc(256) -> 438 cycles kfree -> 550 cycles
> 10000 times kmalloc(512) -> 527 cycles kfree -> 626 cycles
> 10000 times kmalloc(1024) -> 678 cycles kfree -> 775 cycles
> 10000 times kmalloc(2048) -> 748 cycles kfree -> 822 cycles
> 10000 times kmalloc(4096) -> 641 cycles kfree -> 650 cycles
> 10000 times kmalloc(8192) -> 741 cycles kfree -> 817 cycles
> 10000 times kmalloc(16384) -> 872 cycles kfree -> 927 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 332 cycles
> 10000 times kmalloc(16)/kfree -> 327 cycles
> 10000 times kmalloc(32)/kfree -> 323 cycles
> 10000 times kmalloc(64)/kfree -> 320 cycles
> 10000 times kmalloc(128)/kfree -> 320 cycles
> 10000 times kmalloc(256)/kfree -> 333 cycles
> 10000 times kmalloc(512)/kfree -> 332 cycles
> 10000 times kmalloc(1024)/kfree -> 330 cycles
> 10000 times kmalloc(2048)/kfree -> 334 cycles
> 10000 times kmalloc(4096)/kfree -> 674 cycles
> 10000 times kmalloc(8192)/kfree -> 1155 cycles
> 10000 times kmalloc(16384)/kfree -> 1226 cycles
> 
> Slub cmpxchg.
> 
> 1. Kmalloc: Repeatedly allocate then free test
> 10000 times kmalloc(8) -> 296 cycles kfree -> 515 cycles
> 10000 times kmalloc(16) -> 193 cycles kfree -> 412 cycles
> 10000 times kmalloc(32) -> 188 cycles kfree -> 422 cycles
> 10000 times kmalloc(64) -> 222 cycles kfree -> 441 cycles
> 10000 times kmalloc(128) -> 292 cycles kfree -> 476 cycles
> 10000 times kmalloc(256) -> 414 cycles kfree -> 589 cycles
> 10000 times kmalloc(512) -> 513 cycles kfree -> 673 cycles
> 10000 times kmalloc(1024) -> 694 cycles kfree -> 825 cycles
> 10000 times kmalloc(2048) -> 739 cycles kfree -> 878 cycles
> 10000 times kmalloc(4096) -> 636 cycles kfree -> 653 cycles
> 10000 times kmalloc(8192) -> 715 cycles kfree -> 799 cycles
> 10000 times kmalloc(16384) -> 855 cycles kfree -> 927 cycles
> 2. Kmalloc: alloc/free test
> 10000 times kmalloc(8)/kfree -> 354 cycles
> 10000 times kmalloc(16)/kfree -> 336 cycles
> 10000 times kmalloc(32)/kfree -> 335 cycles
> 10000 times kmalloc(64)/kfree -> 337 cycles
> 10000 times kmalloc(128)/kfree -> 337 cycles
> 10000 times kmalloc(256)/kfree -> 355 cycles
> 10000 times kmalloc(512)/kfree -> 354 cycles
> 10000 times kmalloc(1024)/kfree -> 337 cycles
> 10000 times kmalloc(2048)/kfree -> 339 cycles
> 10000 times kmalloc(4096)/kfree -> 674 cycles
> 10000 times kmalloc(8192)/kfree -> 1128 cycles
> 10000 times kmalloc(16384)/kfree -> 1240 cycles
> 
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/