lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <939aefb5-8f03-fc5a-9e8b-0b634aafd0a4@intel.com>
Date:   Tue, 2 Mar 2021 09:04:16 +0100
From:   Björn Töpel <bjorn.topel@...el.com>
To:     Toke Høiland-Jørgensen <toke@...hat.com>,
        Björn Töpel <bjorn.topel@...il.com>,
        ast@...nel.org, daniel@...earbox.net, netdev@...r.kernel.org,
        bpf@...r.kernel.org, paulmck@...nel.org
Cc:     magnus.karlsson@...el.com, jonathan.lemon@...il.com,
        maximmi@...dia.com, andrii@...nel.org
Subject: Re: [PATCH bpf-next 1/2] xsk: update rings for
 load-acquire/store-release semantics




On 2021-03-01 17:08, Toke Høiland-Jørgensen wrote:
> Björn Töpel <bjorn.topel@...il.com> writes:
> 
>> From: Björn Töpel <bjorn.topel@...el.com>
>>
>> Currently, the AF_XDP rings uses smp_{r,w,}mb() fences on the
>> kernel-side. By updating the rings for load-acquire/store-release
>> semantics, the full barrier on the consumer side can be replaced with
>> improved performance as a nice side-effect.
>>
>> Note that this change does *not* require similar changes on the
>> libbpf/userland side, however it is recommended [1].
>>
>> On x86-64 systems, by removing the smp_mb() on the Rx and Tx side, the
>> l2fwd AF_XDP xdpsock sample performance increases by
>> 1%. Weakly-ordered platforms, such as ARM64 might benefit even more.
>>
>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
>>
>> Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
>> ---
>>   net/xdp/xsk_queue.h | 27 +++++++++++----------------
>>   1 file changed, 11 insertions(+), 16 deletions(-)
>>
>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
>> index 2823b7c3302d..e24279d8d845 100644
>> --- a/net/xdp/xsk_queue.h
>> +++ b/net/xdp/xsk_queue.h
>> @@ -47,19 +47,18 @@ struct xsk_queue {
>>   	u64 queue_empty_descs;
>>   };
>>   
>> -/* The structure of the shared state of the rings are the same as the
>> - * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
>> - * ring, the kernel is the producer and user space is the consumer. For
>> - * the Tx and fill rings, the kernel is the consumer and user space is
>> - * the producer.
>> +/* The structure of the shared state of the rings are a simple
>> + * circular buffer, as outlined in
>> + * Documentation/core-api/circular-buffers.rst. For the Rx and
>> + * completion ring, the kernel is the producer and user space is the
>> + * consumer. For the Tx and fill rings, the kernel is the consumer and
>> + * user space is the producer.
>>    *
>>    * producer                         consumer
>>    *
>> - * if (LOAD ->consumer) {           LOAD ->producer
>> - *                    (A)           smp_rmb()       (C)
>> + * if (LOAD ->consumer) {  (A)      LOAD.acq ->producer  (C)
> 
> Why is LOAD.acq not needed on the consumer side?
>

You mean why LOAD.acq is not needed on the *producer* side, i.e. the
->consumer? The ->consumer is a control dependency for the store, so
there is no ordering constraint for ->consumer at producer side. If
there's no space, no data is written. So, no barrier is needed there --
at least that has been my perspective.

This is very similar to the buffer in
Documentation/core-api/circular-buffers.rst. Roping in Paul for some
guidance.


Björn

> -Toke
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ