[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190923144931.GC2369@hirez.programming.kicks-ass.net>
Date: Mon, 23 Sep 2019 16:49:31 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: David Howells <dhowells@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Will Deacon <will@...nel.org>,
"Paul E. McKenney" <paulmck@...ux.ibm.com>,
Mark Rutland <mark.rutland@....com>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Do we need to correct barriering in circular-buffers.rst?
On Thu, Sep 19, 2019 at 02:59:06PM +0100, David Howells wrote:
> But I don't agree with this. You're missing half the barriers. There should
> be *four* barriers. The document mandates only 3 barriers, and uses
> READ_ONCE() where the fourth should be, i.e.:
>
> thread #1 thread #2
>
> smp_load_acquire(head)
> ... read data from queue ..
> smp_store_release(tail)
>
> READ_ONCE(tail)
> ... add data to queue ..
> smp_store_release(head)
>
Notably your READ_ONCE() pseudo code is lacking a conditional;
kernel/events/ring_buffer.c writes it like so:
* kernel user
*
* if (LOAD ->data_tail) { LOAD ->data_head
* (A) smp_rmb() (C)
* STORE $data LOAD $data
* smp_wmb() (B) smp_mb() (D)
* STORE ->data_head STORE ->data_tail
* }
*
* Where A pairs with D, and B pairs with C.
*
* In our case (A) is a control dependency that separates the load of
* the ->data_tail and the stores of $data. In case ->data_tail
* indicates there is no room in the buffer to store $data we do not.
*
* D needs to be a full barrier since it separates the data READ
* from the tail WRITE.
*
* For B a WMB is sufficient since it separates two WRITEs, and for C
* an RMB is sufficient since it separates two READs.
Where 'kernel' is the producer and 'user' is the consumer. This was
written before load-acquire and store-release came about (I _think_),
and I've so far resisted updating B to store-release because smp_wmb()
is actually cheaper than store-release on a number of architectures
(notably ARM).
C ought to be a load-aquire, and D really should be a store-release, but
I don't think the perf userspace has that (or uses C11).
Powered by blists - more mailing lists