linux-kernel - Re: [RFC PATCH 03/15] Provide atomic_t functions implemented with ISO-C++11 atomics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160519142252.GR3528@linux.vnet.ibm.com>
Date:	Thu, 19 May 2016 07:22:52 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	David Howells <dhowells@...hat.com>, linux-arch@...r.kernel.org,
	x86@...nel.org, will.deacon@....com, linux-kernel@...r.kernel.org,
	ramana.radhakrishnan@....com, dwmw2@...radead.org
Subject: Re: [RFC PATCH 03/15] Provide atomic_t functions implemented with
 ISO-C++11 atomics

On Thu, May 19, 2016 at 12:50:00PM +0200, Peter Zijlstra wrote:
> On Thu, May 19, 2016 at 10:52:19AM +0100, David Howells wrote:
> > Peter Zijlstra <peterz@...radead.org> wrote:
> > 
> > > Does this generate 'sane' code for LL/SC archs? That is, a single LL/SC
> > > loop and not a loop around an LL/SC cmpxchg.
> 
> > I think the code it generates should look something like:
> > 
> > 	test_atomic_add_unless:
> > 	.L7:
> > 		ldaxr	w1, [x0]		# __atomic_load_n()
> > 		cmp	w1, 35			# } if (cur == unless)
> > 		beq	.L4			# }     break
> > 		add	w2, w1, 86		# new = cur + addend
> > 		stlxr	w4, w2, [x0]
> > 		cbnz	w4, .L7
> > 	.L4:
> > 		mov	w1, w0
> > 		ret
> > 
> > but that requires the compiler to split up the LDAXR and STLXR instructions
> > and render arbitrary code between.
> 
> Exactly.
> 
> > I suspect that might be quite a stretch.
> > 
> > I've opened:
> > 
> > 	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71191
> > 
> > to cover this.

Thank you!

> Thanks; until such time as this stretch has been made I don't see this
> intrinsic stuff being much use on any of the LL/SC archs.

Agreed, these sorts of instruction sequences make a lot of sense.
Of course, if you stuff too many intructions and cache misses between
the LL and the SC, the SC success probability starts dropping, but short
seqeunces of non-memory-reference instructions like the above should be
just fine.

							Thanx, Paul