lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 17 Jun 2022 15:55:35 +0800
From:   Rongwei Wang <rongwei.wang@...ux.alibaba.com>
To:     Christoph Lameter <cl@...two.de>
Cc:     David Rientjes <rientjes@...gle.com>, songmuchun@...edance.com,
        Hyeonggon Yoo <42.hyeyoo@...il.com>, akpm@...ux-foundation.org,
        vbabka@...e.cz, roman.gushchin@...ux.dev, iamjoonsoo.kim@....com,
        penberg@...nel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and
 slab_free



On 6/13/22 9:50 PM, Christoph Lameter wrote:
> On Sat, 11 Jun 2022, Rongwei Wang wrote:
> 
>>> Ok so the idea is to take the lock only if kmem_cache_debug. That looks
>>> ok. But it still adds a number of new branches etc to the free loop.
>>>
>>> Some performance tests would be useful.
>> Hi Christoph
>>
>> Thanks for your time!
>> Do you have some advice in benchmarks that need me to test? And I find that
>> hackbench and lkp was used frequently in mm/slub.c commits[1,2]. But I have no
>> idea how to use these two benchmarks test to cover the above changes. Can you
>> give some examples? Thanks very much!
> 
> 
> Hi Rongwei,
> 
> Well run hackbench with an without the change.
> 
> There are also synthetic benchmarks available  at
> https://gentwo.org/christoph/slub/tests/
Christoph, I refer [1] to test some data below. The slub_test case is 
same to your provided. And here you the result of its test (the baseline 
is the data of upstream kernel, and fix is results of patched kernel).

my test environment: arm64 vm (32 cores and 128G memory)

And I have removed 'slub_debug=UFPZ' in cmdline before testing the 
following two groups of data.

[1]https://lore.kernel.org/linux-mm/20200527103545.4348ac10@carbon/

Single thread testing

1. Kmalloc: Repeatedly allocate then free test

                    before (baseline)        fix
                    kmalloc      kfree       kmalloc      kfree
10000 times 8      7 cycles     8 cycles    5 cycles     7 cycles
10000 times 16     4 cycles     8 cycles    3 cycles     6 cycles
10000 times 32     4 cycles     8 cycles    3 cycles     6 cycles
10000 times 64     3 cycles     8 cycles    3 cycles     6 cycles
10000 times 128    3 cycles     8 cycles    3 cycles     6 cycles
10000 times 256    12 cycles    8 cycles    11 cycles    7 cycles
10000 times 512    27 cycles    10 cycles   23 cycles    11 cycles
10000 times 1024   18 cycles    9 cycles    20 cycles    10 cycles
10000 times 2048   54 cycles    12 cycles   54 cycles    12 cycles
10000 times 4096   105 cycles   20 cycles   105 cycles   25 cycles
10000 times 8192   210 cycles   35 cycles   212 cycles   39 cycles
10000 times 16384  133 cycles   45 cycles   119 cycles   46 cycles


2. Kmalloc: alloc/free test

                                   before (base)   fix
10000 times kmalloc(8)/kfree      3 cycles        3 cycles
10000 times kmalloc(16)/kfree     3 cycles        3 cycles
10000 times kmalloc(32)/kfree     3 cycles        3 cycles
10000 times kmalloc(64)/kfree     3 cycles        3 cycles
10000 times kmalloc(128)/kfree    3 cycles        3 cycles
10000 times kmalloc(256)/kfree    3 cycles        3 cycles
10000 times kmalloc(512)/kfree    3 cycles        3 cycles
10000 times kmalloc(1024)/kfree   3 cycles        3 cycles
10000 times kmalloc(2048)/kfree   3 cycles        3 cycles
10000 times kmalloc(4096)/kfree   3 cycles        3 cycles
10000 times kmalloc(8192)/kfree   3 cycles        3 cycles
10000 times kmalloc(16384)/kfree  33 cycles       33 cycles


Concurrent allocs

                                 before (baseline)   fix
Kmalloc N*alloc N*free(8)       Average=17/18       Average=11/11
Kmalloc N*alloc N*free(16)      Average=15/49       Average=9/11
Kmalloc N*alloc N*free(32)      Average=15/40       Average=9/11
Kmalloc N*alloc N*free(64)      Average=15/44       Average=9/10
Kmalloc N*alloc N*free(128)     Average=15/42       Average=10/10
Kmalloc N*alloc N*free(256)     Average=128/28      Average=71/22
Kmalloc N*alloc N*free(512)     Average=206/34      Average=178/26
Kmalloc N*alloc N*free(1024)	Average=762/37      Average=369/27
Kmalloc N*alloc N*free(2048)	Average=327/58      Average=339/33
Kmalloc N*alloc N*free(4096)    Average=2255/128    Average=1813/64

                                 before (baseline)   fix
Kmalloc N*(alloc free)(8)       Average=3           Average=3
Kmalloc N*(alloc free)(16)      Average=3           Average=3
Kmalloc N*(alloc free)(32)      Average=3           Average=3
Kmalloc N*(alloc free)(64)      Average=3           Average=3
Kmalloc N*(alloc free)(128)     Average=3           Average=3
Kmalloc N*(alloc free)(256)     Average=3           Average=3
Kmalloc N*(alloc free)(512)     Average=3           Average=3
Kmalloc N*(alloc free)(1024)    Average=3           Average=3
Kmalloc N*(alloc free)(2048)    Average=3           Average=3
Kmalloc N*(alloc free)(4096)	Average=3           Average=3

According to the above data, It seems that no significant performance 
degradation in patched kernel. Plus, in concurrent allocs test, likes 
Kmalloc N*alloc N*free(1024), the data of 'fix' column is better than 
baseline (it looks less is better, if I am wrong, please let me know). 
And if you have other suggestions, I can try to test more data.

Thanks for your time!
-wrw
> 
> These measure the cycles that slab operations take. However, they are a
> bit old and I think Pekka may have a newer version of these
> patches.
> 
> Greetings,
> 	Christoph

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ