linux-kernel - Re: [RFC PATCH v7 7/7] Restartable sequences: self-tests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 25 Jul 2016 16:43:29 +0000 (UTC)
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Dave Watson <davejwatson@...com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Russell King <linux@....linux.org.uk>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
	linux-api <linux-api@...r.kernel.org>,
	Paul Turner <pjt@...gle.com>, Andrew Hunter <ahh@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andy Lutomirski <luto@...capital.net>,
	Andi Kleen <andi@...stfloor.org>, Chris Lameter <cl@...ux.com>,
	Ben Maurer <bmaurer@...com>, rostedt <rostedt@...dmis.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Josh Triplett <josh@...htriplett.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Boqun Feng <boqun.feng@...il.com>
Subject: Re: [RFC PATCH v7 7/7] Restartable sequences: self-tests

----- On Jul 24, 2016, at 2:01 PM, Dave Watson davejwatson@...com wrote:

>>> +static inline __attribute__((always_inline))
>>> +bool rseq_finish(struct rseq_lock *rlock,
>>> + intptr_t *p, intptr_t to_write,
>>> + struct rseq_state start_value)
> 
>>> This ABI looks like it will work fine for our use case. I don't think it
>>> has been mentioned yet, but we may still need multiple asm blocks
>>> for differing numbers of writes. For example, an array-based freelist push:
> 
>>> void push(void *obj) {
>>> if (index < maxlen) {
>>> freelist[index++] = obj;
>>> }
>>> }
> 
>>> would be more efficiently implemented with a two-write rseq_finish:
> 
>>> rseq_finish2(&freelist[index], obj, // first write
>>> &index, index + 1, // second write
>>> ...);
> 
>> Would pairing one rseq_start with two rseq_finish do the trick
>> there ?
> 
> Yes, two rseq_finish works, as long as the extra rseq management overhead
> is not substantial.

The different is actually not negligible. On x86-64
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
(counter increment benchmark (single-thread))

* Single store per increment:                                  3.6 ns
* Two rseq_finish() per increment:                             5.2 ns
* rseq_finish2() with two mov instructions per rseq_finish2(): 4.0 ns

And I expect the difference to be even larger on non-x86 architectures.

I'll try to figure out a way to do rseq_finish() and rseq_finish2()
without duplicating the code. Perhaps macros will be helpful there.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com