linux-kernel - Re: [RFC][PATCH v3]: documentation,atomic: Add new documents

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170801161412.GW3730@linux.vnet.ibm.com>
Date:   Tue, 1 Aug 2017 09:14:12 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Will Deacon <will.deacon@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Boqun Feng <boqun.feng@...il.com>,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Randy Dunlap <rdunlap@...radead.org>
Subject: Re: [RFC][PATCH v3]: documentation,atomic: Add new documents

On Tue, Aug 01, 2017 at 01:17:13PM +0100, Will Deacon wrote:
> On Tue, Aug 01, 2017 at 01:47:44PM +0200, Peter Zijlstra wrote:
> > On Tue, Aug 01, 2017 at 11:19:00AM +0100, Will Deacon wrote:
> > > On Tue, Aug 01, 2017 at 11:01:21AM +0200, Peter Zijlstra wrote:
> > > > On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:
> > > > 
> > > > > Why wouldn't the following have ACQUIRE semantics?
> > > > > 
> > > > > 	atomic_inc(&var);
> > > > > 	smp_mb__after_atomic();
> > > > > 
> > > > > Is the issue that there is no actual value returned or some such?
> > > > 
> > > > Yes, so that the inc is a load-store, and thus there is a load, we loose
> > > > the value.
> > > > 
> > > > But I see your point I think. Irrespective of still having the value,
> > > > the ordering is preserved and nothing should pass across that.
> > > > 
> > > > > So if I have something like this, the assertion really can trigger?
> > > > > 
> > > > > 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> > > > > 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> > > > > 					r1 = READ_ONCE(x);
> > > > > 
> > > > > 
> > > > > 	WARN_ON(r0 == 0 && r1 == 0);
> > > > > 
> > > > > I must confess that I am not seeing why we would want to allow this
> > > > > outcome.
> > > > 
> > > > No you are indeed quite right. I just wasn't creative enough. Thanks for
> > > > the inspiration.
> > > 
> > > Just to close this out, we agree that an smp_rmb() instead of
> > > smp_mb__after_atomic() would *not* forbid this outcome, right?
> > 
> > So that really hurts my brain. Per the normal rules that smp_rmb() would
> > order the read of @x against the last ll of @y and per ll/sc ordering
> > you then still don't get to make the WARN happen.
> > 
> > On IRC you explained that your 8.1 LSE instructions are not in fact
> > ordered by a smp_rmb, only by smp_wmb, which is 'surprising' since you
> > really need to load the old value to compute the new value.
> 
> To be clear, it's only the ST* variants of the LSE instructions that are
> treated as a write for the purposes of memory ordering, so these are the
> non-*_return variants. It's not unlikely that other architectures will
> exhibit the same behaviour (e.g. Power, RISC-V), because the CPU can
> treat non-return atomics as "fire-and-forget" and have them handled
> elsewhere in the memory subsystem, causing them to be treated similarly
> to posted writes.
> 
> For the code snippet above, the second thread has no idea about the value
> of y and so smp_rmb() is the wrong thing to be using imo. It really cares
> about ordering the store to y before the read of x, so needs a full mb (i.e.
> the test is more like 'R' than 'MP').
> 
> Also, wouldn't this problem also arise if your atomics were built using a
> spinlock where unlock had release semantics?

The current Linux kernel memory model forbids this outcome with smp_rmb(),
though I did have to work around the current lack of atomic_inc() using
xchg_relaxed(), so please review my litmus tests carefully.

	C C-WillDeacon-MP+o-r+ai-rmb-o.litmus

	(*
	 * Expected result: Never.
	 *
	 * Desired litmus test, with atomic_inc() emulated by xchg_relaxed():
	 *
	 *     WRITE_ONCE(x, 1);               atomic_inc(&y);
	 *     r0 = xchg_release(&y, 5);       smp_rmb();
	 *                                     r1 = READ_ONCE(x);
	 *
	 *
	 *     WARN_ON(r0 == 0 && r1 == 0);
	 *)

	{
	}

	P0(int *x, int *y)
	{
		WRITE_ONCE(*x, 1);
		r0 = xchg_release(y, 5);
	}

	P1(int *x, int *y)
	{
		r2 = xchg_relaxed(y, 1);
		smp_rmb();
		r1 = READ_ONCE(*x);
	}

	exists
	(0:r0=0 /\ 1:r1=0)

Here is what herd thinks:

	$ herd7 -bell strong-kernel.bell -cat weak-kernel.cat -macros linux.def ../litmus/manual/kernel/C-WillDeacon-MP+o-r+ai-rmb-o.litmus
	Test C-WillDeacon-MP+o-r+ai-rmb-o Allowed
	States 3
	0:r0=0; 1:r1=1;
	0:r0=1; 1:r1=0;
	0:r0=1; 1:r1=1;
	No
	Witnesses
	Positive: 0 Negative: 3
	Condition exists (0:r0=0 /\ 1:r1=0)
	Observation C-WillDeacon-MP+o-r+ai-rmb-o Never 0 3
	Hash=0c3e25a94b38708a2c5ea11ff52c8077

I get the same answer from strong-kernel.cat (which is our best-guess
envelope over hardware guarantees), weak-kernel.cat (which is simplified
based on what people actually use), and proposal.cat (which is a candidate
model with further simplifications).

I converted this (possibly incorrectly) to PowerPC assembly:

	PPC w-RMWl-r+w-RMWl-r.litmus
	""
	(*
	 * Does 3.0 Linux-kernel Power atomic_add_return() provide local 
	 * barrier that orders prior stores against subsequent loads?
	 * Use the atomic_add_return() in both threads, but to different variables.
	 * And use the trailing-lwsync variant of atomic_add_return().
	 *)
	(* 24-Aug-2011: ppcmem says "Sometimes" *)
	{
	0:r1=1; 0:r2=x; 0:r3=5; 0:r4=y;   0:r10=0 ; 0:r11=0;
	1:r1=1; 1:r2=x; 1:r3=5; 1:r4=y;   1:r10=0 ; 1:r11=0;
	}
	 P0                | P1                ;
	 stw r1,0(r2)      | lwarx  r11,r10,r4 ;
	 lwsync            | stwcx. r1,r10,r4  ;
	 lwarx  r11,r10,r4 | bne Fail1         ;
	 stwcx. r3,r10,r4  | lwsync            ;
	 bne Fail0         | lwz r3,0(r2)      ;
	 li r3,42          | Fail1:            ;
	 Fail0:            |                   ;


	exists
	(0:r11=0 /\ 0:r3=42 /\ 1:r3=0)

And ppcmem agrees with the linux-kernel memory model:

	[ . . . ]

	Found     82 : Prune count= 13946  seen_succs=  7453   7454 states 
	Found     83 : Prune count= 13997  seen_succs=  7490   7491 states 
	Found     84 : Prune count= 14007  seen_succs=  7506   7507 states 
	Found     85 : Prune count= 17229  seen_succs=  8889   8890 states 
	Found     86 : Prune count= 17235  seen_succs=  8897   8898 states 
	Test w-RMWl-r+w-RMWl-r Allowed
	States 9
	0:r3=5; 0:r11=0; 1:r3=0;
	0:r3=5; 0:r11=0; 1:r3=1;
	0:r3=5; 0:r11=0; 1:r3=5;
	0:r3=5; 0:r11=1; 1:r3=0;
	0:r3=5; 0:r11=1; 1:r3=1;
	0:r3=42; 0:r11=0; 1:r3=1;
	0:r3=42; 0:r11=0; 1:r3=5;
	0:r3=42; 0:r11=1; 1:r3=0;
	0:r3=42; 0:r11=1; 1:r3=1;
	No (allowed not found)
	Condition exists (0:r11=0 /\ 0:r3=42 /\ 1:r3=0)
	Hash=58fb07516ac5697580c33e06a354f667
	Observation w-RMWl-r+w-RMWl-r Never 0 9 

So if ARM really needs the litmus test with smp_rmb() to be allowed,
we need to adjust the Linux-kernel memory model appropriately.  Which
means that one of us needs to reach out to the usual suspects.  Would
you like to do that, or would you like me to?

							Thanx, Paul