linux-kernel - Re: [PATCH] refcount_t: documentation for memory ordering differences

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171203062734.GA3204@andrea>
Date:   Sun, 3 Dec 2017 07:27:34 +0100
From:   Andrea Parri <parri.andrea@...il.com>
To:     Randy Dunlap <rdunlap@...radead.org>
Cc:     Elena Reshetova <elena.reshetova@...el.com>, peterz@...radead.org,
        linux-kernel@...r.kernel.org, keescook@...omium.org,
        david@...morbit.com, Alan Stern <stern@...land.harvard.edu>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH] refcount_t: documentation for memory ordering differences

On Sun, Dec 03, 2017 at 07:20:03AM +0100, Andrea Parri wrote:
> On Fri, Dec 01, 2017 at 12:34:23PM -0800, Randy Dunlap wrote:
> > On 11/29/2017 04:36 AM, Elena Reshetova wrote:
> > > Some functions from refcount_t API provide different
> > > memory ordering guarantees that their atomic counterparts.
> > > This adds a document outlining these differences.
> > > 
> > > Signed-off-by: Elena Reshetova <elena.reshetova@...el.com>
> > > ---
> > >  Documentation/core-api/index.rst              |   1 +
> > >  Documentation/core-api/refcount-vs-atomic.rst | 129 ++++++++++++++++++++++++++
> > >  2 files changed, 130 insertions(+)
> > >  create mode 100644 Documentation/core-api/refcount-vs-atomic.rst
> > 
> > > diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
> > > new file mode 100644
> > > index 0000000..5619d48
> > > --- /dev/null
> > > +++ b/Documentation/core-api/refcount-vs-atomic.rst
> > > @@ -0,0 +1,129 @@
> > > +===================================
> > > +refcount_t API compared to atomic_t
> > > +===================================
> > > +
> > > +The goal of refcount_t API is to provide a minimal API for implementing
> > > +an object's reference counters. While a generic architecture-independent
> > > +implementation from lib/refcount.c uses atomic operations underneath,
> > > +there are a number of differences between some of the refcount_*() and
> > > +atomic_*() functions with regards to the memory ordering guarantees.
> > > +This document outlines the differences and provides respective examples
> > > +in order to help maintainers validate their code against the change in
> > > +these memory ordering guarantees.
> > > +
> > > +memory-barriers.txt and atomic_t.txt provide more background to the
> > > +memory ordering in general and for atomic operations specifically.
> > > +
> > > +Relevant types of memory ordering
> > > +=================================
> > > +
> > > +**Note**: the following section only covers some of the memory
> > > +ordering types that are relevant for the atomics and reference
> > > +counters and used through this document. For a much broader picture
> > > +please consult memory-barriers.txt document.
> > > +
> > > +In the absence of any memory ordering guarantees (i.e. fully unordered)
> > > +atomics & refcounters only provide atomicity and
> > > +program order (po) relation (on the same CPU). It guarantees that
> > > +each atomic_*() and refcount_*() operation is atomic and instructions
> > > +are executed in program order on a single CPU.
> > > +This is implemented using READ_ONCE()/WRITE_ONCE() and
> > > +compare-and-swap primitives.
> > > +
> > > +A strong (full) memory ordering guarantees that all prior loads and
> > > +stores (all po-earlier instructions) on the same CPU are completed
> > > +before any po-later instruction is executed on the same CPU.
> > > +It also guarantees that all po-earlier stores on the same CPU
> > > +and all propagated stores from other CPUs must propagate to all
> > > +other CPUs before any po-later instruction is executed on the original
> > > +CPU (A-cumulative property). This is implemented using smp_mb().
> > 
> > I don't know what "A-cumulative property" means, and google search didn't
> > either.
> 
> The description above seems to follow the (informal) definition given in:
> 
>   https://github.com/aparri/memory-model/blob/master/Documentation/explanation.txt
>   (c.f., in part., Sect. 13-14)
> 
> and formalized by the LKMM. (The notion of A-cumulativity also appears, in
> different contexts, in some memory consistency literature, e.g.,
> 
>   http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/index.html
>   http://www.cl.cam.ac.uk/~pes20/armv8-mca/
>   https://arxiv.org/abs/1308.6810 )
> 
> A typical illustration of A-cumulativity (for smp_store_release(), say) is
> given with the following program:
> 
> int x = 0;
> int y = 0;
> 
> void thread0()
> {
> 	WRITE_ONCE(x, 1);
> }
> 
> void thread1()
> {
> 	int r0;
> 
> 	r0 = READ_ONCE(x);
> 	smp_store_release(&y, 1);
> }
> 
> void thread2()
> {
> 	int r1;
> 	int r2;
> 
> 	r1 = READ_ONCE(y);
> 	smp_rmb();
> 	r2 = READ_ONCE(x);
> }
> 
> (This is a variation of the so called "message-passing" pattern, where the
>  stores are "distributed" over two threads; see also
> 
>   https://github.com/aparri/memory-model/blob/master/litmus-tests/WRC%2Bpooncerelease%2Brmbonceonce%2BOnce.litmus )
> 
> The question we want to address is whether the final state
> 
>   (r0 == 1 && r1 == 1 && r2 == 0)
> 
> can be reached/is allowed, and the answer is no (due to the A-cumulativity
> of the store-release).
> 
> By contrast, dependencies provides no (A-)cumulativity; for example, if we
> modify the previous program by replacing the store-release with a data dep.
> as follows:
> 
> int x = 0;
> int y = 0;
> 
> void thread0()
> {
> 	WRITE_ONCE(x, 1);
> }
> 
> void thread1()
> {
> 	int r0;
> 
> 	r0 = READ_ONCE(x);
> 	WRITE_ONCE(x, r0);

should have been "WRITE_ONCE(y, r0);"

  Andrea


> }
> 
> void thread2()
> {
> 	int r1;
> 	int r2;
> 
> 	r1 = READ_ONCE(y);
> 	smp_rmb();
> 	r2 = READ_ONCE(x);
> }
> 
> then that same final state is allowed (and observed on some PPC machines).
> 
>   Andrea
> 
> 
> > 
> > Is it non-cumulative, similar to typical vs. atypical, where atypical
> > roughly means non-typical.  Or is it accumlative (something being
> > accumulated, summed up, gathered up)?
> > 
> > Or is it something else.. TBD?
> > 
> > > +A RELEASE memory ordering guarantees that all prior loads and
> > > +stores (all po-earlier instructions) on the same CPU are completed
> > > +before the operation. It also guarantees that all po-earlier
> > > +stores on the same CPU and all propagated stores from other CPUs
> > > +must propagate to all other CPUs before the release operation
> > > +(A-cumulative property). This is implemented using smp_store_release().
> > 
> > thanks.
> > -- 
> > ~Randy