lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 15 Apr 2013 10:37:56 -0400
From:	Waiman Long <Waiman.Long@...com>
To:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	David Howells <dhowells@...hat.com>,
	Dave Jones <davej@...hat.com>,
	Clark Williams <williams@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>
Cc:	Waiman Long <Waiman.Long@...com>, linux-kernel@...r.kernel.org,
	x86@...nel.org, linux-arch@...r.kernel.org,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	Davidlohr Bueso <davidlohr.bueso@...com>,
	"Norton, Scott J" <scott.norton@...com>,
	Rik van Riel <riel@...hat.com>
Subject: [PATCH 0/3 v2] mutex: Improve mutex performance by doing less atomic-ops & better spinning

v1->v2
 - Remove the 2 mutex spinner patches and replaced it by another one
   to improve the mutex spinning process.
 - Remove changes made to kernel/mutex.h & localize changes in 
   kernel/mutex.c.
 - Add an optional patch to remove architecture specific check in patch
   1.

This patch set is a collection of 3 different mutex related patches
aimed at improving mutex performance especially for system with large
number of CPUs. This is achieved by doing less atomic operations and
better mutex spinning (when the CONFIG_MUTEX_SPIN_ON_OWNER is on).

The first patch reduces the number of atomic operations executed. It
can produce dramatic performance improvement in the AIM7 benchmark
with large number of CPUs. For example, there was a more than 3X
improvement in the high_systime workload with a 3.7.10 kernel on
an 8-socket x86-64 system with 80 cores. The 3.8 kernels, on the
other hand, are not mutex limited for that workload anymore. So the
performance improvement is only about 1% for the high_systime workload.

Patches 2 improves the mutex spinning process by reducing contention
among the spinners when competing for the mutex. This is done by
using a MCS lock to put the spinners in a queue so that only the
first spinner will try to acquire the mutex when it is available. This
patch showed significant performance improvement of +30% on the AIM7
fserver and new_fserver workload.

The last patch is an optional one for backing out architecture specific
check in patch 1, if so desired.

Waiman Long (3):
  mutex: Make more scalable by doing less atomic operations
  mutex: Queue mutex spinners with MCS lock to reduce cacheline
    contention
  mutex: back out architecture specific check for negative mutex count

 include/linux/mutex.h |    3 ++
 include/linux/sched.h |    3 ++
 kernel/mutex.c        |   92 ++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/core.c   |   24 +++++++++++--
 4 files changed, 115 insertions(+), 7 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ