linux-kernel - RE: [PATCH] refcount: provide same memory ordering guarantees as in atomic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2236FBA76BA1254E88B949DDB74E612B802B92EC@IRSMSX102.ger.corp.intel.com>
Date:   Fri, 3 Nov 2017 11:55:17 +0000
From:   "Reshetova, Elena" <elena.reshetova@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "keescook@...omium.org" <keescook@...omium.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "ishkamiel@...il.com" <ishkamiel@...il.com>,
        Will Deacon <will.deacon@....com>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        "stern@...land.harvard.edu" <stern@...land.harvard.edu>,
        "parri.andrea@...il.com" <parri.andrea@...il.com>,
        "boqun.feng@...il.com" <boqun.feng@...il.com>,
        "dhowells@...hat.com" <dhowells@...hat.com>,
        "david@...morbit.com" <david@...morbit.com>
Subject: RE: [PATCH] refcount: provide same memory ordering guarantees as in
 atomic_t


> On Thu, Nov 02, 2017 at 11:04:53AM +0000, Reshetova, Elena wrote:
> 
> > Well refcount_dec_and_test() is not the only function that has different
> > memory ordering specifics. So, the full answer then for any arbitrary case
> > according to your points above would be smth like:
> >
> > for each substituted function (atomic_* --> refcount_*) that actually
> >  has changes in memory ordering *** perform the following:
> >   - mention the difference
> >   - mention the actual code place where the change would affect (
> >    various put and etc. functions)
> >   - potentially try to understand if it would make a difference for
> >   this code (again here is where I am not sure how to do it properly)
> >
> >
> > *** the actual list of these functions to me looks like:
> 
> >  1) atomic_inc_not_zero -> refcount_inc_not_zero.  Change is from
> >  underlying atomic_cmpxchg() to atomic_try_cmpxchg_relaxed () First
> >  one implies SMP-conditional general memory barrier (smp_mb()) on each
> >  side of the actual operation (at least according to documentation, In
> >  reality it goes through so many changes, ifdefs and conditionals that
> >  one gets lost Easily in the process).
> 
> That matches _inc(), which doesn't imply anything.

So you are saying that atomic_inc_not_zero has the same guarantees (meaning
no guarantees) as atomic_inc?  If yes, then I am now confused here because
atomic_inc_not_zero based on atomic_cmpxchg() which according to this
https://elixir.free-electrons.com/linux/latest/source/Documentation/memory-barriers.txt#L2527
does imply the smp_mb()....

> 
> The reasoning being that you must already have obtained a pointer to the
> object; and that will necessarily include enough ordering to observe the
> object. Therefore increasing the refcount doesn't require further
> constraints.
> 
> > Second one according to the comment implies no memory barriers
> > whatsoever, BUT "provides a control dependency which will order future
> > stores against the inc" So, every store (not load) operation (I guess
> > we are talking here only about store operations that touch the same
> > object, but I wonder how it is defined in terms of memory location?
> 
> Memory location is irrelevant.

I was just trying to understand to what "stores" does control dependency
barrier applies here? You mean it applies to absolutely all stores on all 
objects? I guess we are talking about the same object here, just trying to 
understand how object is defined in terms of memory location. 
> 
> > (overlapping?)  that comes inside "if refcount_inc_not_zero(){}" cause
> > would only be executed if functions returns true.
> 
> The point is that a CPU must NOT speculate on stores. So while it can
> speculate a branch, any store inside the branch must not become visible
> until it can commit to that branch.
> 
> The whole point being that possible modifications to the object to which
> we've _conditionally_ acquired a reference, will only happen after we're
> sure to indeed have acquired this reference.
> 
> Otherwise its similar to _inc().

Yes, now I understand this part. 

> 
> > So, practically what we might "loose" here is any updates on the
> > object protected by this refcounter done by another CPU. But for this
> > purpose we expect the developer to take some other lock/memory barrier
> > into use, right?
> 
> Correct, object modification had better be serialized. Refcounts cannot
> (even with atomic_t) help with that.
> 
> > We only care of incrementing the refcount atomically and make sure we
> > don't do anything with object unless it is ready for us to be used.
> 
> Just so..
> 
> > If yes, then  I guess it might be a big change for the code that
> > previously relied on atomic-provided smp_mb() barriers and now instead
> > needs to take an explicit locks/barriers by itself.
> 
> Right, however such memory ordering should be explicitly documented.
> Unknown and hidden memory ordering is a straight up bug, because
> modifications to the code (be they introducing refcount_t or anything
> else) can easily break things.

Yes, this is what has been mentioned before many times, but again reality might
be different, so better be prepared here also. 
> 
> > 2) atomic_** -> refcount_add_not_zero. Fortunately these are super
> > rare and need to see per each case dependent on actual atomic function
> > substituted.
> 
> See inc_not_zero.
> 
> > 3) atomic_add() --> refcount_add(). This should not make any change
> > since both do not provide memory ordering at all, but there is a
> > comment in the refcount.c that says that refcount_add " provide a
> > control dependency and thereby orders future stores".  How is it done
> > given that refcount_add is void returning function??  I am fully
> > confused with this one.
> 
> Weird, mostly comes from being implemented using add_not_zero I suppose.

Yes, underlying is add_not_zero, but is it still correct to talk about any control 
dependencies here? How would it possibly look in the code? What is the surrounding
if statement? 

> 
> > 4) atomic_dec () --> refcount_dec (). This one we have discussed
> > extensively before. Again first one implies SMP-conditional general
> > memory barrier (smp_mb()) on each side of the actual operation.
> 
> No, atomic_dec() does not in fact imply anything..
> 
> > Second one only provides "release" ordering meaning that prior
> > both loads and stores must be completed before the operation.
> > So, is it correct to express the difference in this way:
> >
> > atomic_dec ():			refcount_dec ():
> >
> > smp_mb();			smp_mb();
> > do_atomic_dec_operation;		do_atomic_dec_operation;
> > smp_mb();			/*note no any barrier here! */
> 
> 
> No, on two points: atomic_dec() does not imply _anything_ and while
> refcount_dec() does the release, that is distinctly different from
> smp_mb() before.

Oh, yes, sorry this got confused. So, here we actually have the opposite situation:
refcount variant kind of provides a bit more order here than atomic variant.  
> 
>   C C-peterz-release-vs-mb
> 
>   {
>   }
> 
>   P0(int *a, int *b, int *c)
>   {
> 	  WRITE_ONCE(*a, 1);
>   //	smp_mb();
> 	  smp_store_release(b, 1);
> 	  WRITE_ONCE(*c, 1);
>   }
> 
>   P1(int *a, int *b, int *c)
>   {
> 	  r3 = READ_ONCE(*c);
> 	  smp_rmb();
> 	  r2 = smp_load_acquire(b);
> 	  r1 = READ_ONCE(*a);
>   }
> 
>   exists (1:r1=0 /\ 1:r2=0 /\ 1:r3=1)
> 
> That happens without the smp_mb(), once you put that back its no longer
> allowed.
> 
> The reason is that smp_store_release() does not constrain the store to
> @c and that is allowed to happen before, whereas with smp_mb() that
> store can not happen before the store to @a.
> 
> smp_store_release() only affects the prior load/stores not anything
> later, whereas smp_mb() imposes full order.

This is a good example, thank you, helps understanding greatly. 

> 
> > 5) atomic_dec_and_test () --> refcount_dec_and_test (). Same as 4)
> > but in addition we have a control flow barrier.
> 
> No, this one was in fact a full barrier and is now a release.
> 
> > So, is it correct to express the difference in this way:
> >
> > atomic_dec_and_test ():	                            refcount_dec_and_test ():
> >
> > smp_mb();                                                               smp_mb();
> > if (do_atomic_dec_and_test;) {                          if (do_atomic_dec_and_test;){
> >     /* 1 --> 0 transition happened */                        /* 1 --> 0 transition happened */
> >      smp_mb();                                                               /* in refcounter case we assume
> >      do_anything including free();                                  that we are the last user of
> object
> >                                                                                            so that no concurrent access can
> happen*/
> >                                                                                        /* control_flow_barrier here */ <--
> ---- which one btw????
> >                                                                                        /* only need to guarantee that we
> free after
> >                                                                                             reaching zero so we are good
> here */
> > }                                                                                }
>   smp_mb();
> 
> or better:
> 
>   if ( ({ smp_mb(); ret = do_atomic_dec_and_test(); smp_mb(); ret }) )
> 
> The difference being that the smp_mb happens on both branches of the
> condition (or in fact before the branch).
> 
> Same note as the above; the smp_mb() on the refcount_dec_and_test() side
> is strictly stronger than the release.

What is the correct way to show the control flow barriers in the example? 
The memory-barriers.txt only uses READ/WRITE_ONCE() notation. 

> 
> > 6) atomic_sub_and_test () --> refcount_sub_and_test (). Identical to 5)
> 
> Yes, with the same note as for add*, these really should not be used.
> 
> > 7) atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one().
> > Should be identical to 4)
> 
> Correct; was fully ordered, is now a release.
> 
> > Lock functions such as refcount_dec_and_lock() &
> > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > they atomic counterparts.
> 
> Nope. The atomic_dec_and_lock() provides smp_mb() while
> refcount_dec_and_lock() merely orders all prior load/store's against all
> later load/store's.

Yes, I confused the whole thing again. Yes, atomic_dec_and_lock()
is based on atomic_add_unless() which impies smp_mb(). 


> 
> The difference is subtle and involves at least 3 CPUs. I can't seem to
> write up anything simple, keeps turning into monsters :/ Will, Paul,
> have you got anything simple around?
> 
> > Is the above a correct view on this? I think that instead of putting
> > this whole info in the individual commits, how about making some
> > documentation subsection (in memory-barriers or atomics or on its own,
> > don't know what is better) with examples like above?
> 
> > Then in each individual commit I can point to the respective line in
> > the doc and say that here is a particular memory guarantee change and
> > in the case of this particular code it would apply to your
> > put_pi_state() or whatever else function.  Otherwise keeping this info
> > only in commits do not benefit any future users of refcount.
> 
> Sure you can write a refcount-vs-atomic.txt file to call out these
> differences.

Ok, I will try collecting the relevant examples from this thread as it goes 
and will try to send some first version on Monday. 

> 
> > > So while memory-barriers.txt is a longish document, it is readable with
> > > a bit of time and effort. There are also tools being developed that can
> > > help people validate their code.
> >
> > Could you please point to the tools? Might help my understanding also.
> 
> https://github.com/aparri/memory-model

Thank you, will try to check!
> 
> > So, I actually went and tried to __slowly__ read the
> > memory-barriers.txt which indeed makes a lot of sense when you read it
> > in full (not partially like it did it before). But the issue with this
> > doc is not so much the length but not always consequent way of giving
> > the information to unprepared reader.
> 
> Fair enough; that document has grown over the years as has our
> understanding on the subject. I would imagine it is indeed somewhat
> inconsistent.
> 
> Note that there's work done on better documents and updates to this one.
> One document that might be good to read (I have not in fact had time to
> read it myself yet :-():
> 
>   https://github.com/aparri/memory-
> model/blob/master/Documentation/explanation.txt

Ok, let me try this one. Anyhow it would take a while for all this info to 
settle properly in my head, there are a lot of details to keep in mind all
the time....

> 
> > I have many examples that I have wondered for a while what is meant with
> > certain term or notion somewhere at the beginning of the doc
> > only to discover a fully specified answer somewhere after line 1800.
> 
> > I do have many suggestions that might help to improve the readability
> > of the document but it would take me at least a day to write them down
> > (and it is going to be a long read for anyone), so before I go do it,
> > I want to know if anyone is interested to hear :) Anyway this would be
> > a separate exercise to do if people want.
> 
> Yes, in general feedback on that document is appreciated. Cc'ed a number
> of more memory ordering people.
> 
> That said; we have to rely on maintainers to be aware of 'some' memory
> ordering requirements, otherwise it will be exceedingly hard to find
> them.

You mean hard to find requirements or maintainers that keep all these details
in their heads? :) 
 
> 
> One tell-tale sign is if there are otherwise unmatched memory barriers
> in the code. In that case they could end up being matched by atomic or
> lock ops. And at that point we need to be careful and reverse engineer
> what ordering is required.

So, you mean that if I see somewhere that there is let's say smp_rmb(), 
but then can't find any matching smp_wmb() or smp_mb() then I should check if it has 
been done "underneath" using one of atomic functions. I think this is 
a good tip for checking actually, I will start looking into these.
Might be good tip to put into doc also as suggestion for people to double 
check their code when/if they convert.  


Best Regards,
Elena.