[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <507F66F6.20704@cn.fujitsu.com>
Date: Thu, 18 Oct 2012 10:18:30 +0800
From: Lai Jiangshan <laijs@...fujitsu.com>
To: Mikulas Patocka <mpatocka@...hat.com>
CC: Steven Rostedt <rostedt@...dmis.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jens Axboe <axboe@...nel.dk>, linux-kernel@...r.kernel.org,
linux-arch@...r.kernel.org,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH] percpu-rwsem: use barrier in unlock path
On 10/18/2012 04:28 AM, Steven Rostedt wrote:
> On Wed, Oct 17, 2012 at 11:07:21AM -0400, Mikulas Patocka wrote:
>>>
>>> Even the previous patch is applied, percpu_down_read() still
>>> needs mb() to pair with it.
>>
>> percpu_down_read uses rcu_read_lock which should guarantee that memory
>> accesses don't escape in front of a rcu-protected section.
>
> You do realize that rcu_read_lock() does nothing more that a barrier(),
> right?
>
> Paul worked really hard to get rcu_read_locks() to not call HW barriers.
>
>>
>> If rcu_read_unlock has only an unlock barrier and not a full barrier,
>> memory accesses could be moved in front of rcu_read_unlock and reordered
>> with this_cpu_inc(*p->counters), but it doesn't matter because
>> percpu_down_write does synchronize_rcu(), so it never sees these accesses
>> halfway through.
>
> Looking at the patch, you are correct. The read side doesn't need the
> memory barrier as the worse thing that will happen is that it sees the
> locked = false, and will just grab the mutex unnecessarily.
---------------------
A memory barrier can be added iff these two things are known:
1) it disables the disordering between what and what.
2) what is the corresponding mb() that it pairs with.
You tried to add a mb() in percpu_up_write(), OK, I know it disables the disordering
between the writes to the protected data and the statement "p->locked = false",
But I can't find out the corresponding mb() that it pairs with.
percpu_down_read() writes to the data
The cpu cache/prefetch the data writes to the data
which is chaos writes to the data
percpu_up_write()
mb()
p->locked = false;
unlikely(p->locked)
the cpu see p->lock = false,
don't discard the cached/prefetch data
this_cpu_inc(*p->counters);
the code of read-access to the data
****and we use the chaos data*****
So you need to add a mb() after "unlikely(p->locked)".
-------------------------
The RCU you use don't protect any data. It protects codes of the fast path:
unlikely(p->locked);
this_cpu_inc(*p->counters);
and synchronize_rcu() ensures all previous fast path had fully finished
"this_cpu_inc(*p->counters);".
It don't protect other code/data, if you want to protect other code or other
data, please add more synchronizations or mb()s.
---------------
I extremely hate a synchronization protects code instead of data.
but sometimes I also have to do it.
---------------
a very draft example of paired-mb()s is here:
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index cf80f7e..84a93c0 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -12,6 +12,14 @@ struct percpu_rw_semaphore {
struct mutex mtx;
};
+#if 1
+#define light_mb() barrier()
+#define heavy_mb() synchronize_sched()
+#else
+#define light_mb() smp_mb()
+#define heavy_mb() smp_mb();
+#endif
+
static inline void percpu_down_read(struct percpu_rw_semaphore *p)
{
rcu_read_lock();
@@ -24,22 +32,12 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *p)
}
this_cpu_inc(*p->counters);
rcu_read_unlock();
+ light_mb(); /* A, between read of p->locked and read of data, paired with D */
}
static inline void percpu_up_read(struct percpu_rw_semaphore *p)
{
- /*
- * On X86, write operation in this_cpu_dec serves as a memory unlock
- * barrier (i.e. memory accesses may be moved before the write, but
- * no memory accesses are moved past the write).
- * On other architectures this may not be the case, so we need smp_mb()
- * there.
- */
-#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && !defined(CONFIG_X86_OOSTORE))
- barrier();
-#else
- smp_mb();
-#endif
+ light_mb(); /* B, between read of the data and write to p->counter, paired with C */
this_cpu_dec(*p->counters);
}
@@ -61,11 +59,12 @@ static inline void percpu_down_write(struct percpu_rw_semaphore *p)
synchronize_rcu();
while (__percpu_count(p->counters))
msleep(1);
- smp_rmb(); /* paired with smp_mb() in percpu_sem_up_read() */
+ heavy_mb(); /* C, between read of p->counter and write to data, paired with B */
}
static inline void percpu_up_write(struct percpu_rw_semaphore *p)
{
+ heavy_mb(); /* D, between write to data and write to p->locked, paired with A */
p->locked = false;
mutex_unlock(&p->mtx);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists