lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 26 Nov 2013 00:12:31 -0800
From:	Davidlohr Bueso <davidlohr@...com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	LKML <linux-kernel@...r.kernel.org>, Jason Low <jason.low2@...com>,
	Ingo Molnar <mingo@...nel.org>,
	Darren Hart <dvhart@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Mike Galbraith <efault@....de>, Jeff Mahoney <jeffm@...e.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Scott Norton <scott.norton@...com>,
	Tom Vaden <tom.vaden@...com>,
	Aswin Chandramouleeswaran <aswin@...com>,
	Waiman Long <Waiman.Long@...com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [RFC patch 0/5] futex: Allow lockless empty check of hashbucket
 plist in futex_wake()

On Mon, 2013-11-25 at 20:58 +0000, Thomas Gleixner wrote:
> The patch set from Davidlohr [1] tried to attempt the same via an
> atomic counter of waiters in a hash bucket. The atomic counter access
> provided enough serialization for x86 so that a failure is not
> observable in testing, but does not provide any guarantees.
> 
> The same can be achieved with a smp_mb() pair including proper
> guarantees for all architectures.

I am becoming hesitant about this approach. The following are some
results, from my quad-core laptop, measuring the latency of nthread
wakeups (1 at a time). In addition, failed wait calls never occur -- so
we don't end up including the (otherwise minimal) overhead of the list
queue+dequeue, only measuring the smp_mb() usage when !empty list never
occurs.

+---------+--------------------+--------+-------------------+--------+----------+
| threads | baseline time (ms) | stddev | patched time (ms) | stddev | overhead |
+---------+--------------------+--------+-------------------+--------+----------+
|     512 | 4.2410             | 0.9762 | 12.3660           | 5.1020 | +191.58% |
|     256 | 2.7750             | 0.3997 | 7.0220            | 2.9436 | +153.04% |
|     128 | 1.4910             | 0.4188 | 3.7430            | 0.8223 | +151.03% |
|      64 | 0.8970             | 0.3455 | 2.5570            | 0.3710 | +185.06% |
|      32 | 0.3620             | 0.2242 | 1.1300            | 0.4716 | +212.15% |
+---------+--------------------+--------+-------------------+--------+----------+

While the variation is quite a bit in the patched version for higher
nthreads, the overhead is significant in all cases. Now, this is a very
specific program and far from what occurs in the real world, but I
believe it's good data to have to make a future decision about this kind
of approach.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ