linux-kernel - Re: Broken ARM atomic ops wrt memory barriers (was : [PATCH] Add cmpxchg support for ARMv6+ systems)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 25 May 2009 16:22:10 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
Cc:	Jamie Lokier <jamie@...reable.org>,
	Catalin Marinas <catalin.marinas@....com>,
	linux-arm-kernel@...ts.arm.linux.org.uk,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	"David S. Miller" <davem@...emloft.net>
Subject: Re: Broken ARM atomic ops wrt memory barriers (was : [PATCH] Add
	cmpxchg support for ARMv6+ systems)

* Russell King - ARM Linux (linux@....linux.org.uk) wrote:
> This reply is based upon what's in your email rather than atomic_ops.
> 
> On Mon, May 25, 2009 at 01:29:55PM -0400, Mathieu Desnoyers wrote:
> > This is a very good start, but I think a few are still missing :
> > 
> > in atomic.h :
> > 
> > /* Atomic operations are already serializing on ARM */
> > #define smp_mb__before_atomic_dec()     barrier()
> > #define smp_mb__after_atomic_dec()      barrier()
> > #define smp_mb__before_atomic_inc()     barrier()
> > #define smp_mb__after_atomic_inc()      barrier()
> > 
> > should probably map to smp_mb() for arm v6+.
> 
> BTW, I think you're wrong here.  atomic_dec() and atomic_inc() are
> implemented using atomic_add_return() and atomic_sub_return().  Both
> of these functions are serializing as a result of the patch you
> replied to.
> 

Hrm, then atomic add/dec should not be implemented on top of the add/sub
return with memory barriers, given this would kill performance for no
reason. It would be better to re-implement mb-less add/dec and add those
smp_mb__*() primitives.

But you are right : it's not a bug, just... very slow.

> > Also, bitops.h should have : (taken from powerpc)
> > 
> > /*
> >  * clear_bit doesn't imply a memory barrier
> >  */
> > #define smp_mb__before_clear_bit()      smp_mb()
> > #define smp_mb__after_clear_bit()       smp_mb()
> 
> Again, disagree.  With the current definition being mb(), they become
> either:
> 
> - a compiler barrier on UP architectures (which don't have weak ordering
>   models)
> - a data memory barrier on UP coherent xscale (don't know if this has
>   weak ordering)
> - a data memory barrier on SMP
> 
> So, I think no change is required; mb() is doing at least the right thing.
> (Whether it's heavier than it actually needs to be is another question,
> and that only affects the coherent xscale stuff.  That is out of my
> knowledge to answer.)

Right, no bug here, only probably much slower.

To give you an order of magnitude, a cmpxchg_local primitive without the
memory barriers takes 11 cycles on my ARMv7 omap3. cmpxchg with mb()
before and after the cmpxchg takes 71 cycles. That's 6.45 times slower.

> 
> > According to atomic_ops.txt, 3 other bitwise atomic ops imply memory
> > barriers :
> > 
> > "There are two special bitops with lock barrier semantics (acquire/release,
> > same as spinlocks). These operate in the same way as their non-_lock/unlock
> > postfixed variants, except that they are to provide acquire/release semantics,
> > respectively. This means they can be used for bit_spin_trylock and
> > bit_spin_unlock type operations without specifying any more barriers.
> > 
> >         int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
> >         void clear_bit_unlock(unsigned long nr, unsigned long *addr);
> >         void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
> > 
> > The __clear_bit_unlock version is non-atomic, however it still implements
> > unlock barrier semantics. This can be useful if the lock itself is protecting
> > the other bits in the word."
> 
> It looks to me that if we make arch/arm/lib/bitops.h fully ordered then
> these get sorted out for free.
> 

Yes, this has been my first thought too. I think all these
implementations use bitops.h.

> > arch/arm/include/asm/mutex.h should also have smp_mb() to provide
> > acquire/release semantic to mutex fastpath (like spinlock does),
> > otherwise subtle deadlocks and various problems could occur.
> 
> Hmm, the mutex is undocumented in the atomic ops document.  Does it
> require ordering both before and after, or do some of those ops just
> need it before acquire and after release?
> 

I guess the mutex fast path should probably be added to atomic_ops.txt.
If I look at PowerPC mutex.h, mutex lock provides acquire semantic (like
spinlock) and mutex unlock provides release semantic (like spin unlock).

acquire :

take lock
smp_mb()
(critical section memory accesses)

release :

(critical section memory accesses)
smp_mb()
release lock

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/