linux-kernel - [PATCH 0/2] tools perf: Add a new benchmark tool for semaphore/mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <491D6B4EAD0A714894D8AD22F4BDE043B158BE@SCYBEXDAG03.amd.com>
Date:	Mon, 16 Apr 2012 08:33:55 +0000
From:	"Chen, Dennis (SRDC SW)" <Dennis1.Chen@....com>
To:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:	Ingo Molnar <mingo@...nel.org>,
	"paulmck@...ux.vnet.ibm.com" <paulmck@...ux.vnet.ibm.com>,
	"peterz@...radead.org" <peterz@...radead.org>,
	Paul Mackerras <paulus@...ba.org>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>
Subject: [PATCH 0/2] tools perf: Add a new benchmark tool for semaphore/mutex

<PATCH PREFACE>
-------------------
This patch series are used to add a new performance benchmark tool for semaphore or mutex:
The new tool will fork NR tasks specified through the command line and bind each of them
to every CPUs in the system equally. The command to launch the tool looks like:
'# perf bench locking mutex -p 8 -t 400 -c'

The above command will create 400 tasks in a system with 8-CPU, each CPU will have 50 tasks.
After the task be created, it will read all the files and directories in '/sys/module'.
sysfs is RAM based and its read operation for both dir and file is very sensitive for mutex
lock, also '/sys/module' has almost no dependencies on external devices.

We can use this tool with 'perf record' command to get the hot-spot of the codes or 
'perf top -g' to get live info, for example, below is a test case run in a intel i7-2600 box
(-c option is to get the cpu cycles, I don't use it in this test case):

# perf record -a perf bench locking mutex -p 8 -t 4000
# Running locking/mutex benchmark... 
 ...
 [13894 ]/6  duration        23 s   609392 us
 [13996 ]/4  duration        23 s   599418 us
 [14056 ]/0  duration        23 s   595710 us
 [13715 ]/3  duration        23 s   621719 us
 [13390 ]/6  duration        23 s   644020 us
 [13696 ]/0  duration        23 s   623101 us
 [14334 ]/6  duration        23 s   580262 us
 [14343 ]/7  duration        23 s   578702 us
 [14283 ]/3  duration        23 s   583007 us
 -----------------------------------
 Total duration     79353 s   943945 us

 real: 23.84   s
 user: 0.00   
 sys:  0.45   

# perf report
===================================================================================
...
# perf version : 3.3.2
# arch : x86_64
# nrcpus online : 8
# nrcpus avail : 8
# cpudesc : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
# total memory : 3966460 kB
# cmdline : /usr/bin/perf record -a perf bench locking mutex -p 8 -t 4000

# Events: 131K cycles
#
# Overhead          Command                      Shared Object                                 Symbol
# ........  ...............  .................................  .....................................
#
    22.12%           perf  [kernel.kallsyms]                  [k] __mutex_lock_slowpath
     8.27%           perf  [kernel.kallsyms]                  [k] _raw_spin_lock
     6.16%           perf  [kernel.kallsyms]                  [k] mutex_unlock
     5.22%           perf  [kernel.kallsyms]                  [k] mutex_spin_on_owner
     4.94%           perf  [kernel.kallsyms]                  [k] sysfs_refresh_inode
     4.82%           perf  [kernel.kallsyms]                  [k] mutex_lock
     2.67%           perf  [kernel.kallsyms]                  [k] __mutex_unlock_slowpath
     2.61%           perf  [kernel.kallsyms]                  [k] link_path_walk
     2.42%           perf  [kernel.kallsyms]                  [k] _raw_spin_lock_irqsave
     1.61%           perf  [kernel.kallsyms]                  [k] __d_lookup
     1.18%           perf  [kernel.kallsyms]                  [k] clear_page_c
     1.16%           perf  [kernel.kallsyms]                  [k] dput
     0.97%           perf  [kernel.kallsyms]                  [k] do_lookup
     0.93%        swapper  [kernel.kallsyms]                  [k] intel_idle
     0.87%           perf  [kernel.kallsyms]                  [k] get_page_from_freelist
     0.85%           perf  [kernel.kallsyms]                  [k] __strncpy_from_user
     0.81%           perf  [kernel.kallsyms]                  [k] system_call
     0.78%           perf  libc-2.13.so                       [.] 0x84ef0         
     0.71%           perf  [kernel.kallsyms]                  [k] vfsmount_lock_local_lock
     0.68%           perf  [kernel.kallsyms]                  [k] sysfs_dentry_revalidate
     0.62%           perf  [kernel.kallsyms]                  [k] try_to_wake_up
     0.62%           perf  [kernel.kallsyms]                  [k] kfree
     0.60%           perf  [kernel.kallsyms]                  [k] kmem_cache_alloc   
............................................................................................

We can see that for 4000 tasks running in 8 CPUs simultaneously, it will create a very heavy 
contention for the mutex lock, so lot's of tasks enter into the slow path of the mutex lock...
I am very curious if we switch the mutex to the semaphore in this case, how's thing going? 
My next plan




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/