linux-kernel - Re: [RFC patch 0/5] futex: Allow lockless empty check of hashbucket plist in futex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1385499085.23083.7.camel@buesod1.americas.hpqcorp.net>
Date:	Tue, 26 Nov 2013 12:51:25 -0800
From:	Davidlohr Bueso <davidlohr@...com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	Jason Low <jason.low2@...com>, Ingo Molnar <mingo@...nel.org>,
	Darren Hart <dvhart@...ux.intel.com>,
	Mike Galbraith <efault@....de>, Jeff Mahoney <jeffm@...e.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Scott Norton <scott.norton@...com>,
	Tom Vaden <tom.vaden@...com>,
	Aswin Chandramouleeswaran <aswin@...com>,
	Waiman Long <Waiman.Long@...com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [RFC patch 0/5] futex: Allow lockless empty check of hashbucket
 plist in futex_wake()

On Tue, 2013-11-26 at 11:25 -0800, Davidlohr Bueso wrote:
> On Tue, 2013-11-26 at 09:52 +0100, Peter Zijlstra wrote:
> > On Tue, Nov 26, 2013 at 12:12:31AM -0800, Davidlohr Bueso wrote:
> > 
> > > I am becoming hesitant about this approach. The following are some
> > > results, from my quad-core laptop, measuring the latency of nthread
> > > wakeups (1 at a time). In addition, failed wait calls never occur -- so
> > > we don't end up including the (otherwise minimal) overhead of the list
> > > queue+dequeue, only measuring the smp_mb() usage when !empty list never
> > > occurs.
> > > 
> > > +---------+--------------------+--------+-------------------+--------+----------+
> > > | threads | baseline time (ms) | stddev | patched time (ms) | stddev | overhead |
> > > +---------+--------------------+--------+-------------------+--------+----------+
> > > |     512 | 4.2410             | 0.9762 | 12.3660           | 5.1020 | +191.58% |
> > > |     256 | 2.7750             | 0.3997 | 7.0220            | 2.9436 | +153.04% |
> > > |     128 | 1.4910             | 0.4188 | 3.7430            | 0.8223 | +151.03% |
> > > |      64 | 0.8970             | 0.3455 | 2.5570            | 0.3710 | +185.06% |
> > > |      32 | 0.3620             | 0.2242 | 1.1300            | 0.4716 | +212.15% |
> > > +---------+--------------------+--------+-------------------+--------+----------+
> > > 
> > 
> > Whee, this is far more overhead than I would have expected... pretty
> > impressive really for a simple mfence ;-)
> 
> *sigh* I just realized I had some extra debugging options in the .config
> I ran for the patched kernel. This probably explains why the huge
> overhead. I'll rerun and report shortly.

I'm very sorry about the false alarm -- after midnight my brain starts
to melt. After re-running everything on my laptop (yes, with the
correct .config file), I can see that the differences are rather minimal
and variation also goes down, as expected. I've also included the
results for the original atomic ops approach, which mostly measures the
atomic_dec when we dequeue the woken task. Results are in the noise
range and virtually the same for both approaches (at least on a smaller
x86_64 system).

+---------+-----------------------------+----------------------------+------------------------------+
| threads | baseline time (ms) [stddev] | barrier time (ms) [stddev] | atomicops time (ms) [stddev] |
+---------+-----------------------------+----------------------------+------------------------------+
|     512 | 2.8360 [0.5168]             | 4.4100 [1.1150]            | 3.8150 [1.3293]              |
|     256 | 2.5080 [0.6375]             | 2.3070 [0.5112]            | 2.5980 [0.9079]              |
|     128 | 1.0200 [0.4264]             | 1.3980 [0.3391]            | 1.5180 [0.4902]              |
|      64 | 0.7890 [0.2667]             | 0.6970 [0.3374]            | 0.4020 [0.2447]              |
|      32 | 0.1150 [0.0184]             | 0.1870 [0.1428]            | 0.1490 [0.1156]              |
+---------+-----------------------------+----------------------------+------------------------------+

FYI I've uploaded the test program:
https://github.com/davidlohr/futex-stress/blob/master/futex_wake.c

I will now start running bigger, more realistic, workloads like the ones
described in the original patchset to get the big picture.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/