lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 01 Mar 2021 17:08:43 +0100
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Björn Töpel <bjorn.topel@...il.com>,
        ast@...nel.org, daniel@...earbox.net, netdev@...r.kernel.org,
        bpf@...r.kernel.org
Cc:     Björn Töpel <bjorn.topel@...el.com>,
        magnus.karlsson@...el.com, jonathan.lemon@...il.com,
        maximmi@...dia.com, andrii@...nel.org
Subject: Re: [PATCH bpf-next 1/2] xsk: update rings for
 load-acquire/store-release semantics

Björn Töpel <bjorn.topel@...il.com> writes:

> From: Björn Töpel <bjorn.topel@...el.com>
>
> Currently, the AF_XDP rings uses smp_{r,w,}mb() fences on the
> kernel-side. By updating the rings for load-acquire/store-release
> semantics, the full barrier on the consumer side can be replaced with
> improved performance as a nice side-effect.
>
> Note that this change does *not* require similar changes on the
> libbpf/userland side, however it is recommended [1].
>
> On x86-64 systems, by removing the smp_mb() on the Rx and Tx side, the
> l2fwd AF_XDP xdpsock sample performance increases by
> 1%. Weakly-ordered platforms, such as ARM64 might benefit even more.
>
> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
>
> Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
> ---
>  net/xdp/xsk_queue.h | 27 +++++++++++----------------
>  1 file changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 2823b7c3302d..e24279d8d845 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -47,19 +47,18 @@ struct xsk_queue {
>  	u64 queue_empty_descs;
>  };
>  
> -/* The structure of the shared state of the rings are the same as the
> - * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
> - * ring, the kernel is the producer and user space is the consumer. For
> - * the Tx and fill rings, the kernel is the consumer and user space is
> - * the producer.
> +/* The structure of the shared state of the rings are a simple
> + * circular buffer, as outlined in
> + * Documentation/core-api/circular-buffers.rst. For the Rx and
> + * completion ring, the kernel is the producer and user space is the
> + * consumer. For the Tx and fill rings, the kernel is the consumer and
> + * user space is the producer.
>   *
>   * producer                         consumer
>   *
> - * if (LOAD ->consumer) {           LOAD ->producer
> - *                    (A)           smp_rmb()       (C)
> + * if (LOAD ->consumer) {  (A)      LOAD.acq ->producer  (C)

Why is LOAD.acq not needed on the consumer side?

-Toke

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ