[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.44L0.1711021123210.1277-100000@iolanthe.rowland.org>
Date: Thu, 2 Nov 2017 11:40:35 -0400 (EDT)
From: Alan Stern <stern@...land.harvard.edu>
To: Peter Zijlstra <peterz@...radead.org>
cc: "Reshetova, Elena" <elena.reshetova@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"keescook@...omium.org" <keescook@...omium.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"ishkamiel@...il.com" <ishkamiel@...il.com>,
Will Deacon <will.deacon@....com>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
<parri.andrea@...il.com>, <boqun.feng@...il.com>,
<dhowells@...hat.com>, <david@...morbit.com>
Subject: Re: [PATCH] refcount: provide same memory ordering guarantees as in
atomic_t
On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> > Lock functions such as refcount_dec_and_lock() &
> > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > they atomic counterparts.
>
> Nope. The atomic_dec_and_lock() provides smp_mb() while
> refcount_dec_and_lock() merely orders all prior load/store's against all
> later load/store's.
In fact there is no guaranteed ordering when refcount_dec_and_lock()
returns false; it provides ordering only if the return value is true.
In which case it provides acquire ordering (thanks to the spin_lock),
and both release ordering and a control dependency (thanks to the
refcount_dec_and_test).
> The difference is subtle and involves at least 3 CPUs. I can't seem to
> write up anything simple, keeps turning into monsters :/ Will, Paul,
> have you got anything simple around?
The combination of acquire + release is not the same as smp_mb, because
they allow things to pass by in one direction. Example:
C C-refcount-vs-atomic-dec-and-lock
{
}
P0(int *x, int *y, refcount_t *r)
{
refcount_set(r, 1);
WRITE_ONCE(*x, 1);
smp_wmb();
WRITE_ONCE(*y, 1);
}
P1(int *x, int *y, refcount_t *r, spinlock_t *s)
{
int rx, ry;
bool r1;
ry = READ_ONCE(*y);
r1 = refcount_dec_and_lock(r, s);
if (r1)
rx = READ_ONCE(*x);
}
exists (1:ry=1 /\ 1:r1=1 /\ 1:rx=0)
This is allowed. The idea is that the CPU can take:
Read y
Acquire
Release
Read x
and execute the first read after the Acquire and the second read before
the Release:
Acquire
Read y
Read x
Release
and then the CPU can reorder the reads:
Acquire
Read x
Read y
Release
If the program had used atomic_dec_and_lock() instead, which provides a
full smp_mb barrier, this outcome would not be possible.
Alan Stern
Powered by blists - more mailing lists