linux-kernel - Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190420085440.GK14111@linux.ibm.com>
Date:   Sat, 20 Apr 2019 01:54:40 -0700
From:   "Paul E. McKenney" <paulmck@...ux.ibm.com>
To:     Nicholas Piggin <npiggin@...il.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        LKMM Maintainers -- Akira Yokosawa <akiyks@...il.com>,
        Andrea Parri <andrea.parri@...rulasolutions.com>,
        Boqun Feng <boqun.feng@...il.com>,
        David Howells <dhowells@...hat.com>,
        Daniel Lustig <dlustig@...dia.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Kernel development list <linux-kernel@...r.kernel.org>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Alan Stern <stern@...land.harvard.edu>,
        Will Deacon <will.deacon@....com>
Subject: Re: [PATCH] Documentation: atomic_t.txt: Explain ordering provided
 by smp_mb__{before,after}_atomic()

On Sat, Apr 20, 2019 at 10:26:15AM +1000, Nicholas Piggin wrote:
> Paul E. McKenney's on April 20, 2019 4:26 am:
> > On Fri, Apr 19, 2019 at 08:00:17PM +0200, Peter Zijlstra wrote:
> >> On Fri, Apr 19, 2019 at 01:21:45PM -0400, Alan Stern wrote:
> >> > Index: usb-devel/Documentation/atomic_t.txt
> >> > ===================================================================
> >> > --- usb-devel.orig/Documentation/atomic_t.txt
> >> > +++ usb-devel/Documentation/atomic_t.txt
> >> > @@ -171,7 +171,10 @@ The barriers:
> >> >    smp_mb__{before,after}_atomic()
> >> >  
> >> >  only apply to the RMW ops and can be used to augment/upgrade the ordering
> >> > -inherent to the used atomic op. These barriers provide a full smp_mb().
> >> > +inherent to the used atomic op. Unlike normal smp_mb() barriers, they order
> >> > +only the RMW op itself against the instructions preceding the
> >> > +smp_mb__before_atomic() or following the smp_mb__after_atomic(); they do
> >> > +not order instructions on the other side of the RMW op at all.
> >> 
> >> Now it is I who is confused; what?
> >> 
> >> 	x = 1;
> >> 	smp_mb__before_atomic();
> >> 	atomic_add(1, &a);
> >> 	y = 1;
> >> 
> >> the stores to both x and y will be ordered as if an smp_mb() where
> >> there. There is no order between a and y otoh.
> > 
> > Let's look at x86.  And a slightly different example:
> > 
> > 	x = 1;
> > 	smp_mb__before_atomic();
> > 	atomic_add(1, &a);
> > 	r1 = y;
> > 
> > The atomic_add() asm does not have the "memory" constraint, which is
> > completely legitimate because atomic_add() does not return a value,
> > and thus guarantees no ordering.  The compiler is therefore within
> > its rights to transform the code into the following:
> > 
> > 	x = 1;
> > 	smp_mb__before_atomic();
> > 	r1 = y;
> > 	atomic_add(1, &a);
> > 
> > But x86's smp_mb__before_atomic() is just a compiler barrier, and
> > x86 is further allowed to reorder prior stores with later loads.
> > The CPU can therefore execute this code as follows:
> > 
> > 	r1 = y;
> > 	x = 1;
> > 	smp_mb__before_atomic();
> > 	atomic_add(1, &a);
> > 
> > So in general, the ordering is guaranteed only to the atomic itself,
> > not to accesses on the other side of the atomic.
> 
> That's interesting. I don't think that's what all our code expects.
> I had the same idea as Peter.
> 
> IIRC the primitive was originally introduced exactly so x86 would not
> need to have the unnecessary hardware barrier with sequences like
> 
>   smp_mb();
>   ...
>   atomic_inc(&v);
> 
> The "new" semantics are a bit subtle. One option might be just to
> replace it entirely with atomic_xxx_mb() ?

Hmmm...  There are more than 2,000 uses of atomic_inc() in the kernel.
There are about 300-400 total between smp_mb__before_atomic() and
smp_mb__after_atomic().

So what are our options?

1.	atomic_xxx_mb() as you say.

	From a quick scan of smp_mb__before_atomic() uses, we need this
	for atomic_inc(), atomic_dec(), atomic_add(), atomic_sub(),
	clear_bit(), set_bit(), test_bit(), atomic_long_dec(),
	atomic_long_add(), refcount_dec(), cmpxchg_relaxed(),
	set_tsk_thread_flag(), clear_bit_unlock().

	Another random look identifies atomic_andnot().

	And atomic_set(): set_preempt_state().	This fails
	on x86, s390, and TSO friends, does it not?  Or is
	this ARM-only?	Still, why not just smp_mb() before and
	after?	Same issue in __kernfs_new_node(), bio_cnt_set(),
	sbitmap_queue_update_wake_batch(), 

	Ditto for atomic64_set() in __ceph_dir_set_complete().

	Ditto for atomic_read() in rvt_qp_is_avail().  This function
	has a couple of other oddly placed smp_mb__before_atomic().

	And atomic_cmpxchg(): msc_buffer_alloc().  This instance
	of smp_mb__before_atomic() can be removed unless I am missing
	something subtle.  Ditto for kvm_vcpu_exiting_guest_mode(),
	pv_kick_node(), __sbq_wake_up(), 

	And lock acquisition??? acm_read_bulk_callback().

	In nfnl_acct_fill_info(), a smp_mb__before_atomic() after
	a atomic64_xchg()???  Also before a clear_bit(), but the
	clear_bit() is inside an "if".

	There are a few cases that would see added overhead.  For example,
	svc_get_next_xprt() has the following:

		smp_mb__before_atomic();
		clear_bit(SP_CONGESTED, &pool->sp_flags);
		clear_bit(RQ_BUSY, &rqstp->rq_flags);
		smp_mb__after_atomic();

	And xs_sock_reset_connection_flags() has this:

		smp_mb__before_atomic();
		clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
		clear_bit(XPRT_CLOSING, &xprt->state);
		xs_sock_reset_state_flags(xprt);  /* Also a clear_bit(). */
		smp_mb__after_atomic();

	Yeah, there are more than a few misuses, aren't there?  :-/
	A coccinelle script seems in order.  In 0day test robot.

	But there are a number of helper functions whose purpose
	seems to be to wrap an atomic in smp_mb__before_atomic() and
	smp_mb__after_atomic(), so some of the atomic_xxx_mb() functions
	might be a good idea just for improved readability.

2.	Add something to checkpatch.pl warning about non-total ordering,
	with the error message explaining that the ordering is partial.

3.	Make non-value-returning atomics provide full ordering.
	This would of course need some benchmarking, but would be a
	simple change to make and would eliminate a large class of
	potential bugs.  My guess is that the loss in performance
	would be non-negligible, but who knows?

4.	Your idea here!

							Thanx, Paul