lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu,  4 Apr 2013 10:54:17 -0400
From:	Waiman Long <Waiman.Long@...com>
To:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	David Howells <dhowells@...hat.com>,
	Dave Jones <davej@...hat.com>,
	Clark Williams <williams@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>
Cc:	Davidlohr Bueso <davidlohr.bueso@...com>,
	Waiman Long <Waiman.Long@...com>, linux-kernel@...r.kernel.org,
	"Chandramouleeswaran, Aswin" <aswin@...com>
Subject: [PATCH RFC 2/3] mutex: restrict mutex spinning to only one task per mutex

The current mutex spinning code allow multiple tasks to spin on a
single mutex concurrently. There are two major problems with this
approach:

 1. This is not very energy efficient as the spinning tasks are not
    doing useful work. The spinning tasks may also block other more
    important or useful tasks from running as preemption is disabled.
    Only one of the spinners will get the mutex at any time. The
    other spinners will have to wait for much longer to get it.

 2. The mutex data structure on x86-64 should be 32 bytes. The spinning
    code spin on lock->owner which, in most cases, should be in the same
    64-byte cache line as the lock->wait_lock spinlock. As a result,
    the mutex spinners are contending the same cacheline with other
    CPUs trying to get the spinlock leading to increased time spent
    on the spinlock as well as on the mutex spinning.

These problems are worse on system with large number of CPUs. One way
to reduce the effect of these two problems is to allow only one task
to be spinning on a mutex at any time.

This patch adds a new spinner field in the mutex.h to limit the
number of spinner to only one task. That will increase the size of
the mutex by 8 bytes in a 64-bit environment (4 bytes in a 32-bit
environment).

The AIM7 benchmarks were run on 3.7.10 derived kernels to show the
performance changes with this patch on a 8-socket 80-core system
with hyperthreading off.  The table below shows the mean % change
in performance over a range of users for some AIM7 workloads with
just the less atomic operation patch (patch 1) vs the less atomic
operation patch plus this one (patches 1+2).

+--------------+-----------------+-----------------+-----------------+
|   Workload   | mean % change   | mean % change   | mean % change   |
|              | 10-100 users    | 200-1000 users  | 1100-2000 users |
+--------------+-----------------+-----------------+-----------------+
| alltests     |     -0.2%       |     -3.8%       |    -4.2%        |
| five_sec     |     -0.6%       |     -2.0%       |    -2.4%        |
| fserver      |     +2.2%       |    +16.2%       |    +2.2%        |
| high_systime |     -0.3%       |     -4.3%       |    -3.0%        |
| new_fserver  |     +3.9%       |    +16.0%       |    +9.5%        |
| shared       |     -1.7%       |     -5.0%       |    -4.0%        |
| short        |     -7.7%       |     +0.2%       |    +1.3%        |
+--------------+-----------------+-----------------+-----------------+

It can be seen that this patch improves performance for the fserver and
new_fserver workloads while suffering some slight drop in performance
for the other workloads.

Signed-off-by: Waiman Long <Waiman.Long@...com>
Reviewed-by: Davidlohr Bueso <davidlohr.bueso@...com>
---
 include/linux/mutex.h |    3 +++
 kernel/mutex.c        |   12 ++++++++++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 9121595..dd8fdb8 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -50,6 +50,9 @@ struct mutex {
 	atomic_t		count;
 	spinlock_t		wait_lock;
 	struct list_head	wait_list;
+#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
+	struct task_struct	*spinner;
+#endif
 #if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_SMP)
 	struct task_struct	*owner;
 #endif
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 5e5ea03..965f59f 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -158,7 +158,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	 *
 	 * We can't do this for DEBUG_MUTEXES because that relies on wait_lock
 	 * to serialize everything.
+	 *
+	 * Only first task is allowed to spin on a given mutex and that
+	 * task will put its task_struct pointer into the spinner field.
 	 */
+	if (lock->spinner || (cmpxchg(&lock->spinner, NULL, current) != NULL))
+		goto slowpath;
 
 	for (;;) {
 		struct task_struct *owner;
@@ -175,6 +180,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		    (atomic_cmpxchg(&lock->count, 1, 0) == 1)) {
 			lock_acquired(&lock->dep_map, ip);
 			mutex_set_owner(lock);
+			lock->spinner = NULL;
 			preempt_enable();
 			return 0;
 		}
@@ -196,6 +202,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 */
 		arch_mutex_cpu_relax();
 	}
+
+	/*
+	 * Done with spinning
+	 */
+	lock->spinner = NULL;
+slowpath:
 #endif
 	spin_lock_mutex(&lock->wait_lock, flags);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ