linux-kernel - Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171102135742.7o4urtltgvhr6dku@hirez.programming.kicks-ass.net>
Date:   Thu, 2 Nov 2017 14:57:42 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Reshetova, Elena" <elena.reshetova@...el.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "keescook@...omium.org" <keescook@...omium.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "ishkamiel@...il.com" <ishkamiel@...il.com>,
        Will Deacon <will.deacon@....com>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        stern@...land.harvard.edu, parri.andrea@...il.com,
        boqun.feng@...il.com, dhowells@...hat.com, david@...morbit.com
Subject: Re: [PATCH] refcount: provide same memory ordering guarantees as in
 atomic_t

On Thu, Nov 02, 2017 at 11:04:53AM +0000, Reshetova, Elena wrote:

> Well refcount_dec_and_test() is not the only function that has different
> memory ordering specifics. So, the full answer then for any arbitrary case
> according to your points above would be smth like:
> 
> for each substituted function (atomic_* --> refcount_*) that actually 
>  has changes in memory ordering *** perform the following:
>   - mention the difference
>   - mention the actual code place where the change would affect (
>    various put and etc. functions)
>   - potentially try to understand if it would make a difference for 
>   this code (again here is where I am not sure how to do it properly)
> 
> 
> *** the actual list of these functions to me looks like:

>  1) atomic_inc_not_zero -> refcount_inc_not_zero.  Change is from
>  underlying atomic_cmpxchg() to atomic_try_cmpxchg_relaxed () First
>  one implies SMP-conditional general memory barrier (smp_mb()) on each
>  side of the actual operation (at least according to documentation, In
>  reality it goes through so many changes, ifdefs and conditionals that
>  one gets lost Easily in the process). 

That matches _inc(), which doesn't imply anything.

The reasoning being that you must already have obtained a pointer to the
object; and that will necessarily include enough ordering to observe the
object. Therefore increasing the refcount doesn't require further
constraints.

> Second one according to the comment implies no memory barriers
> whatsoever, BUT "provides a control dependency which will order future
> stores against the inc" So, every store (not load) operation (I guess
> we are talking here only about store operations that touch the same
> object, but I wonder how it is defined in terms of memory location? 

Memory location is irrelevant.

> (overlapping?)  that comes inside "if refcount_inc_not_zero(){}" cause
> would only be executed if functions returns true.

The point is that a CPU must NOT speculate on stores. So while it can
speculate a branch, any store inside the branch must not become visible
until it can commit to that branch.

The whole point being that possible modifications to the object to which
we've _conditionally_ acquired a reference, will only happen after we're
sure to indeed have acquired this reference.

Otherwise its similar to _inc().

> So, practically what we might "loose" here is any updates on the
> object protected by this refcounter done by another CPU. But for this
> purpose we expect the developer to take some other lock/memory barrier
> into use, right?

Correct, object modification had better be serialized. Refcounts cannot
(even with atomic_t) help with that.

> We only care of incrementing the refcount atomically and make sure we
> don't do anything with object unless it is ready for us to be used.

Just so..

> If yes, then  I guess it might be a big change for the code that
> previously relied on atomic-provided smp_mb() barriers and now instead
> needs to take an explicit locks/barriers by itself. 

Right, however such memory ordering should be explicitly documented.
Unknown and hidden memory ordering is a straight up bug, because
modifications to the code (be they introducing refcount_t or anything
else) can easily break things.

> 2) atomic_** -> refcount_add_not_zero. Fortunately these are super
> rare and need to see per each case dependent on actual atomic function
> substituted. 

See inc_not_zero.

> 3) atomic_add() --> refcount_add(). This should not make any change
> since both do not provide memory ordering at all, but there is a
> comment in the refcount.c that says that refcount_add " provide a
> control dependency and thereby orders future stores".  How is it done
> given that refcount_add is void returning function??  I am fully
> confused with this one. 

Weird, mostly comes from being implemented using add_not_zero I suppose.

> 4) atomic_dec () --> refcount_dec (). This one we have discussed
> extensively before. Again first one implies SMP-conditional general
> memory barrier (smp_mb()) on each side of the actual operation. 

No, atomic_dec() does not in fact imply anything..

> Second one only provides "release" ordering meaning that prior
> both loads and stores must be completed before the operation. 
> So, is it correct to express the difference in this way:
> 
> atomic_dec ():			refcount_dec ():
> 
> smp_mb();			smp_mb();
> do_atomic_dec_operation;		do_atomic_dec_operation;
> smp_mb();			/*note no any barrier here! */


No, on two points: atomic_dec() does not imply _anything_ and while
refcount_dec() does the release, that is distinctly different from
smp_mb() before.

  C C-peterz-release-vs-mb

  {
  }

  P0(int *a, int *b, int *c)
  {
	  WRITE_ONCE(*a, 1);
  //	smp_mb();
	  smp_store_release(b, 1);
	  WRITE_ONCE(*c, 1);
  }

  P1(int *a, int *b, int *c)
  {
	  r3 = READ_ONCE(*c);
	  smp_rmb();
	  r2 = smp_load_acquire(b);
	  r1 = READ_ONCE(*a);
  }

  exists (1:r1=0 /\ 1:r2=0 /\ 1:r3=1)

That happens without the smp_mb(), once you put that back its no longer
allowed.

The reason is that smp_store_release() does not constrain the store to
@c and that is allowed to happen before, whereas with smp_mb() that
store can not happen before the store to @a.

smp_store_release() only affects the prior load/stores not anything
later, whereas smp_mb() imposes full order.

> 5) atomic_dec_and_test () --> refcount_dec_and_test (). Same as 4)
> but in addition we have a control flow barrier.

No, this one was in fact a full barrier and is now a release.

> So, is it correct to express the difference in this way:
> 
> atomic_dec_and_test ():	                            refcount_dec_and_test ():
> 
> smp_mb();                                                               smp_mb();
> if (do_atomic_dec_and_test;) {                          if (do_atomic_dec_and_test;){
>     /* 1 --> 0 transition happened */                        /* 1 --> 0 transition happened */
>      smp_mb();                                                               /* in refcounter case we assume
>      do_anything including free();                                  that we are the last user of object
>                                                                                            so that no concurrent access can happen*/
>                                                                                        /* control_flow_barrier here */ <------ which one btw????
>                                                                                        /* only need to guarantee that we free after
>                                                                                             reaching zero so we are good here */
> }                                                                                }
  smp_mb();

or better:

  if ( ({ smp_mb(); ret = do_atomic_dec_and_test(); smp_mb(); ret }) )

The difference being that the smp_mb happens on both branches of the
condition (or in fact before the branch).

Same note as the above; the smp_mb() on the refcount_dec_and_test() side
is strictly stronger than the release.

> 6) atomic_sub_and_test () --> refcount_sub_and_test (). Identical to 5)

Yes, with the same note as for add*, these really should not be used.

> 7) atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one(). 
> Should be identical to 4)

Correct; was fully ordered, is now a release.

> Lock functions such as refcount_dec_and_lock() &
> refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> they atomic counterparts. 

Nope. The atomic_dec_and_lock() provides smp_mb() while
refcount_dec_and_lock() merely orders all prior load/store's against all
later load/store's.

The difference is subtle and involves at least 3 CPUs. I can't seem to
write up anything simple, keeps turning into monsters :/ Will, Paul,
have you got anything simple around?

> Is the above a correct view on this? I think that instead of putting
> this whole info in the individual commits, how about making some
> documentation subsection (in memory-barriers or atomics or on its own,
> don't know what is better) with examples like above? 

> Then in each individual commit I can point to the respective line in
> the doc and say that here is a particular memory guarantee change and
> in the case of this particular code it would apply to your
> put_pi_state() or whatever else function.  Otherwise keeping this info
> only in commits do not benefit any future users of refcount. 

Sure you can write a refcount-vs-atomic.txt file to call out these
differences.

> > So while memory-barriers.txt is a longish document, it is readable with
> > a bit of time and effort. There are also tools being developed that can
> > help people validate their code.
> 
> Could you please point to the tools? Might help my understanding also. 

https://github.com/aparri/memory-model

> So, I actually went and tried to __slowly__ read the
> memory-barriers.txt which indeed makes a lot of sense when you read it
> in full (not partially like it did it before). But the issue with this
> doc is not so much the length but not always consequent way of giving
> the information to unprepared reader. 

Fair enough; that document has grown over the years as has our
understanding on the subject. I would imagine it is indeed somewhat
inconsistent.

Note that there's work done on better documents and updates to this one.
One document that might be good to read (I have not in fact had time to
read it myself yet :-():

  https://github.com/aparri/memory-model/blob/master/Documentation/explanation.txt

> I have many examples that I have wondered for a while what is meant with 
> certain term or notion somewhere at the beginning of the doc 
> only to discover a fully specified answer somewhere after line 1800. 

> I do have many suggestions that might help to improve the readability
> of the document but it would take me at least a day to write them down
> (and it is going to be a long read for anyone), so before I go do it,
> I want to know if anyone is interested to hear :) Anyway this would be
> a separate exercise to do if people want. 

Yes, in general feedback on that document is appreciated. Cc'ed a number
of more memory ordering people.

That said; we have to rely on maintainers to be aware of 'some' memory
ordering requirements, otherwise it will be exceedingly hard to find
them.

One tell-tale sign is if there are otherwise unmatched memory barriers
in the code. In that case they could end up being matched by atomic or
lock ops. And at that point we need to be careful and reverse engineer
what ordering is required.