[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181018081434.GT3121@hirez.programming.kicks-ass.net>
Date: Thu, 18 Oct 2018 10:14:34 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: alexei.starovoitov@...il.com, paulmck@...ux.vnet.ibm.com,
will.deacon@....com, acme@...hat.com, yhs@...com,
john.fastabend@...il.com, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next 2/3] tools, perf: use smp_{rmb,mb} barriers
instead of {rmb,mb}
On Thu, Oct 18, 2018 at 01:10:15AM +0200, Daniel Borkmann wrote:
> Wouldn't this then also allow the kernel side to use smp_store_release()
> when it updates the head? We'd be pretty much at the model as described
> in Documentation/core-api/circular-buffers.rst.
>
> Meaning, rough pseudo-code diff would look as:
>
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 5d3cf40..3d96275 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -84,8 +84,9 @@ static void perf_output_put_handle(struct perf_output_handle *handle)
> *
> * See perf_output_begin().
> */
> - smp_wmb(); /* B, matches C */
> - rb->user_page->data_head = head;
> +
> + /* B, matches C */
> + smp_store_release(&rb->user_page->data_head, head);
Yes, this would be correct.
The reason we didn't do this is because smp_store_release() ends up
being smp_mb() + WRITE_ONCE() for a fair number of platforms, even if
they have a cheaper smp_wmb(). Most notably ARM.
(ARM64 OTOH would like to have smp_store_release() there I imagine;
while x86 doesn't care either way around).
A similar concern exists for the smp_load_acquire() I proposed for the
userspace side, ARM would have to resort to smp_mb() in that situation,
instead of the cheaper smp_rmb().
The smp_store_release() on the userspace side will actually be of equal
cost or cheaper, since it already has an smp_mb(). Most notably, x86 can
avoid barrier entirely, because TSO doesn't allow the LOAD-STORE reorder
(it only allows the STORE-LOAD reorder). And PowerPC can use LWSYNC
instead of SYNC.
Powered by blists - more mailing lists