linux-kernel - Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140118122548.GX10038@linux.vnet.ibm.com>
Date:	Sat, 18 Jan 2014 04:25:48 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Matt Turner <mattst88@...il.com>,
	Waiman Long <waiman.long@...com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Ivan Kokshaysky <ink@...assic.park.msu.ru>,
	Daniel J Blueman <daniel@...ascale.com>,
	Richard Henderson <rth@...ddle.net>
Subject: Re: [PATCH v8 4/4] qrwlock: Use smp_store_release() in write_unlock()

On Sat, Jan 18, 2014 at 12:34:06PM +0100, Peter Zijlstra wrote:
> On Sat, Jan 18, 2014 at 02:01:05AM -0800, Paul E. McKenney wrote:
> > OK, I will bite...  Aside from fine-grained code timing, what code could
> > you write to tell the difference between a real one-byte store and an
> > RMW emulating that store?
> 
> Why isn't fine-grained code timing an issue? I'm sure Alpha people will
> love it when their machine magically keels over every so often.
> 
> Suppose we have two bytes in a word that get concurrent updates:
> 
> union {
> 	struct {
> 		u8 a;
> 		u8 b;
> 	};
> 	int word;
> } ponies = { .word = 0, };
> 
> then two threads concurrently do:
> 
> CPU0:		CPU1:
> 
> ponies.a = 5	ponies.b = 10
> 
> 
> At which point you'd expect: a == 5 && b == 10
> 
> However, with a rmw you could end up like:
> 
> 
> 			load r, ponies.word
> load r, ponies.word
> and  r, ~0xFF
> or   r, 5
> store ponies.word, r
> 			and r, ~0xFF00
> 			or r, 10 << 8
> 			store ponies.word, r
> 
> which gives: a == 0 && b == 10
> 
> The same can be had on a single CPU if you make the second RMW an
> interrupt.
> 
> 
> In fact, we recently had such a RMW issue on PPC64 although from a
> slightly different angle, but we managed to hit it quite consistently.
> See commit ba1f14fbe7096.
> 
> The thing is, if we allow the above RMW 'atomic' store, we have to be
> _very_ careful that there cannot be such overlapping stores, otherwise
> things will go BOOM!
> 
> However, if we already have to make sure there's no overlapping stores,
> we might as well write a wide store and not allow the narrow stores to
> begin with, to force people to think about the issue.

Ah, I was assuming atomic rmw, which for Alpha would be implemented using
the LL and SC instructions.  Yes, lots of overhead, but if the CPU
designers chose not to provide a load/store byte...

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/