lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOGi=dNFEPD2cbAqqpttRN_bi90gjJP8NH6HJnnHt4vMor0sJQ@mail.gmail.com>
Date:	Thu, 4 Feb 2016 15:07:11 +0800
From:	Ling Ma <ling.ma.program@...il.com>
To:	Waiman Long <waiman.long@....com>
Cc:	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
	Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	Ling <ling.ml@...baba-inc.com>
Subject: Re: [RFC PATCH] alispinlock: acceleration from lock integration on
 multi-core platform

> I have 2 major comments here. First of all, you should break up your patch
> into smaller ones. Large patch like the one in the tar ball is hard to
> review.

Ok, we will do it.

>Secondly, you are modifying over 1000 lines of code in mm/slab.c
> with some modest increase in performance. That can be hard to justify. Maybe
> you should find other use cases that involve less changes, but still have
> noticeable performance improvement. That will make it easier to be accepted.

In order to be justified the attachment in this letter include 3 files:

1. user space code (thread.c), which  can cause lots of hot kernel spinlock from
__kmalloc and kfree on multi-core platform

2. ali_work_queue.patch , the kernel patch for 4.3.0-rc4,
when we  run user space code (thread.c) based on the patch,
the synchronous operation consumption from __kmalloc and kfree is
about 15% on Intel E5-2699V3

3. org_spin_lock.patch, which is based on above ali_work_queue.patch,
when we  run user space code thread.c based on the patch,
the synchronous operation consumption from __kmalloc and kfree is
about 25% on Intel E5-2699V3


the main difference between ali_work_queue.patch and
org_spin_lock.patch as below:

diff --git a/mm/slab.h b/mm/slab.h
...
-   ali_spinlock_t list_lock;
+   spinlock_t list_lock;
...

diff --git a/mm/slab.c b/mm/slab.c
...
-   alispinlock(lock, &info);
+   spin_lock((spinlock_t *)lock);
+   fn(para);
+   spin_unlock((spinlock_t *)lock);
...

The above operations remove all performance noise from program modification.

We run  user space code thread.c with ali_work_queue.patch, and
org_spin_lock.patch respectively
 the output from thread.c as below:

ORG         NEW
38923684 43380604
38100464 44163011
37769241 43354266
37908638 43554022
37900994 43457066
38495073 43421394
37340217 43146352
38083979 43506951
37713263 43775215
37749871 43487289
37843224 43366055
38173823 43270225
38303612 43214675
37886717 44083950
37736455 43060728
37529307 44607597
38862690 43541484
37992824 44749925
38013454 43572225
37783135 45240502
37745372 44712540
38721413 43584658
38097842 43235392

            ORG            NEW
TOTAL 874675292 1005486126

So the data tell us the new mechanism can improve performance 14% (
1005486126/874675292) ,
and the operation can be justified fairly.

Thanks
Ling

2016-02-04 5:42 GMT+08:00 Waiman Long <waiman.long@....com>:
> On 02/02/2016 11:40 PM, Ling Ma wrote:
>>
>> Longman,
>>
>> The attachment include user space code(thread.c), and kernel
>> patch(ali_work_queue.patch) based on 4.3.0-rc4,
>> we replaced all original spinlock (list_lock) in slab.h/c  with the
>> new mechanism.
>>
>> The thread.c in user space caused lots of hot kernel spinlock from
>> __kmalloc and kfree,
>> perf top -d1 shows ~25%  before ali_work_queue.patch,after appending
>> this patch ,
>> the synchronous operation consumption from __kmalloc and kfree is
>> reduced from 25% to ~15% on Intel E5-2699V3
>> (we also observed the output from user space code (thread.c) is
>> improved clearly)
>
>
> I have 2 major comments here. First of all, you should break up your patch
> into smaller ones. Large patch like the one in the tar ball is hard to
> review. Secondly, you are modifying over 1000 lines of code in mm/slab.c
> with some modest increase in performance. That can be hard to justify. Maybe
> you should find other use cases that involve less changes, but still have
> noticeable performance improvement. That will make it easier to be accepted.
>
> Cheers,
> Longman
>
>

Download attachment "ali_work_queue.tar.bz2" of type "application/x-bzip2" (10827 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ