[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87mtvmx3ec.fsf@toke.dk>
Date: Mon, 01 Mar 2021 17:08:43 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Björn Töpel <bjorn.topel@...il.com>,
ast@...nel.org, daniel@...earbox.net, netdev@...r.kernel.org,
bpf@...r.kernel.org
Cc: Björn Töpel <bjorn.topel@...el.com>,
magnus.karlsson@...el.com, jonathan.lemon@...il.com,
maximmi@...dia.com, andrii@...nel.org
Subject: Re: [PATCH bpf-next 1/2] xsk: update rings for
load-acquire/store-release semantics
Björn Töpel <bjorn.topel@...il.com> writes:
> From: Björn Töpel <bjorn.topel@...el.com>
>
> Currently, the AF_XDP rings uses smp_{r,w,}mb() fences on the
> kernel-side. By updating the rings for load-acquire/store-release
> semantics, the full barrier on the consumer side can be replaced with
> improved performance as a nice side-effect.
>
> Note that this change does *not* require similar changes on the
> libbpf/userland side, however it is recommended [1].
>
> On x86-64 systems, by removing the smp_mb() on the Rx and Tx side, the
> l2fwd AF_XDP xdpsock sample performance increases by
> 1%. Weakly-ordered platforms, such as ARM64 might benefit even more.
>
> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
>
> Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
> ---
> net/xdp/xsk_queue.h | 27 +++++++++++----------------
> 1 file changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 2823b7c3302d..e24279d8d845 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -47,19 +47,18 @@ struct xsk_queue {
> u64 queue_empty_descs;
> };
>
> -/* The structure of the shared state of the rings are the same as the
> - * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
> - * ring, the kernel is the producer and user space is the consumer. For
> - * the Tx and fill rings, the kernel is the consumer and user space is
> - * the producer.
> +/* The structure of the shared state of the rings are a simple
> + * circular buffer, as outlined in
> + * Documentation/core-api/circular-buffers.rst. For the Rx and
> + * completion ring, the kernel is the producer and user space is the
> + * consumer. For the Tx and fill rings, the kernel is the consumer and
> + * user space is the producer.
> *
> * producer consumer
> *
> - * if (LOAD ->consumer) { LOAD ->producer
> - * (A) smp_rmb() (C)
> + * if (LOAD ->consumer) { (A) LOAD.acq ->producer (C)
Why is LOAD.acq not needed on the consumer side?
-Toke
Powered by blists - more mailing lists