linux-kernel - Re: [patch][rfc] x86, mutex: non-atomic unlock (and a rant)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 2 Nov 2009 17:00:31 +0100
From:	Nick Piggin <npiggin@...e.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [patch][rfc] x86, mutex: non-atomic unlock (and a rant)

On Mon, Nov 02, 2009 at 07:20:08AM -0800, Linus Torvalds wrote:
> 
> 
> On Mon, 2 Nov 2009, Nick Piggin wrote:
> > 
> > Non-atomic unlock for mutexs maybe? I do this by relying on cache
> > coherence on a cacheline basis for ordering rather than the memory
> > consistency of the x86. Linus I know you've told me this is an incorrect
> > assumption in the past, but I'm not so sure.
> 
> I'm sure.
> 
> This is simply buggy:
> 
> > +	atomic_set(&lock->count, 1);
> > +	barrier();
> > +	if (unlikely(lock->waiters))
> > +		fail_fn(lock);
> 
> because it doesn't matter one whit whether 'lock->count' and 
> 'lock->waiters' are in the same cacheline or not.
> 
> The cache coherency deals in cachelines, but the instruction re-ordering 
> logic does not. It's entirely possible that the CPU will turn this into
> 
> 	tmp = lock->waiters;
> 	...
> 	atomic_set(&lock->count, 1);
> 	if (tmp)
> 		fail_fn(lock);
> 
> and your "barrier()" did absolutely nothing.
> 
> The fact that it may _work_ in almost all circumstances (and perhaps even 
> "always" on some microarchitectures) is irrelevant. It's simply not 
> guaranteed to work. Yes, you need just the right timings, and yes, it's 
> probably hard to hit. And yes, I can well imagine that some micro-
> architecture will even guarantee the write->read ordering, and that it 
> would _always_ work on that micro-architecture.
> 
> But I can see your thing failing even on an in-order CPU. It literally 
> doesn't even need OoO to fail, all it needs is a sufficiently deep write 
> buffer on an in-order core. And to fail in practice, maybe there needs to 
> be lots of writes in that buffer, and some bad luck, but the thing is, 
> write buffers are not coherent between cores - so the write may have 
> happened as far as the core that does it is concerned, but other cores 
> (or even HT) may not see the new value until after the read has taken 
> effect.

Hm OK I see you must be right there. The trick will only be guaranteed
to work if you operate on exactly the same memory location I guess (or
for store/store vs load/load sequences). In which case, atomic ops
can't be avoided for the unlock case :(

Well, it can use a barrier instead of atomic for unlock, which might
help on some architectures but on x86 I don't think it does much.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/