lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1385499085.23083.7.camel@buesod1.americas.hpqcorp.net>
Date:	Tue, 26 Nov 2013 12:51:25 -0800
From:	Davidlohr Bueso <davidlohr@...com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	Jason Low <jason.low2@...com>, Ingo Molnar <mingo@...nel.org>,
	Darren Hart <dvhart@...ux.intel.com>,
	Mike Galbraith <efault@....de>, Jeff Mahoney <jeffm@...e.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Scott Norton <scott.norton@...com>,
	Tom Vaden <tom.vaden@...com>,
	Aswin Chandramouleeswaran <aswin@...com>,
	Waiman Long <Waiman.Long@...com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [RFC patch 0/5] futex: Allow lockless empty check of hashbucket
 plist in futex_wake()

On Tue, 2013-11-26 at 11:25 -0800, Davidlohr Bueso wrote:
> On Tue, 2013-11-26 at 09:52 +0100, Peter Zijlstra wrote:
> > On Tue, Nov 26, 2013 at 12:12:31AM -0800, Davidlohr Bueso wrote:
> > 
> > > I am becoming hesitant about this approach. The following are some
> > > results, from my quad-core laptop, measuring the latency of nthread
> > > wakeups (1 at a time). In addition, failed wait calls never occur -- so
> > > we don't end up including the (otherwise minimal) overhead of the list
> > > queue+dequeue, only measuring the smp_mb() usage when !empty list never
> > > occurs.
> > > 
> > > +---------+--------------------+--------+-------------------+--------+----------+
> > > | threads | baseline time (ms) | stddev | patched time (ms) | stddev | overhead |
> > > +---------+--------------------+--------+-------------------+--------+----------+
> > > |     512 | 4.2410             | 0.9762 | 12.3660           | 5.1020 | +191.58% |
> > > |     256 | 2.7750             | 0.3997 | 7.0220            | 2.9436 | +153.04% |
> > > |     128 | 1.4910             | 0.4188 | 3.7430            | 0.8223 | +151.03% |
> > > |      64 | 0.8970             | 0.3455 | 2.5570            | 0.3710 | +185.06% |
> > > |      32 | 0.3620             | 0.2242 | 1.1300            | 0.4716 | +212.15% |
> > > +---------+--------------------+--------+-------------------+--------+----------+
> > > 
> > 
> > Whee, this is far more overhead than I would have expected... pretty
> > impressive really for a simple mfence ;-)
> 
> *sigh* I just realized I had some extra debugging options in the .config
> I ran for the patched kernel. This probably explains why the huge
> overhead. I'll rerun and report shortly.

I'm very sorry about the false alarm -- after midnight my brain starts
to melt. After re-running everything on my laptop (yes, with the
correct .config file), I can see that the differences are rather minimal
and variation also goes down, as expected. I've also included the
results for the original atomic ops approach, which mostly measures the
atomic_dec when we dequeue the woken task. Results are in the noise
range and virtually the same for both approaches (at least on a smaller
x86_64 system).

+---------+-----------------------------+----------------------------+------------------------------+
| threads | baseline time (ms) [stddev] | barrier time (ms) [stddev] | atomicops time (ms) [stddev] |
+---------+-----------------------------+----------------------------+------------------------------+
|     512 | 2.8360 [0.5168]             | 4.4100 [1.1150]            | 3.8150 [1.3293]              |
|     256 | 2.5080 [0.6375]             | 2.3070 [0.5112]            | 2.5980 [0.9079]              |
|     128 | 1.0200 [0.4264]             | 1.3980 [0.3391]            | 1.5180 [0.4902]              |
|      64 | 0.7890 [0.2667]             | 0.6970 [0.3374]            | 0.4020 [0.2447]              |
|      32 | 0.1150 [0.0184]             | 0.1870 [0.1428]            | 0.1490 [0.1156]              |
+---------+-----------------------------+----------------------------+------------------------------+

FYI I've uploaded the test program:
https://github.com/davidlohr/futex-stress/blob/master/futex_wake.c

I will now start running bigger, more realistic, workloads like the ones
described in the original patchset to get the big picture.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ