[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <939aefb5-8f03-fc5a-9e8b-0b634aafd0a4@intel.com>
Date: Tue, 2 Mar 2021 09:04:16 +0100
From: Björn Töpel <bjorn.topel@...el.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>,
Björn Töpel <bjorn.topel@...il.com>,
ast@...nel.org, daniel@...earbox.net, netdev@...r.kernel.org,
bpf@...r.kernel.org, paulmck@...nel.org
Cc: magnus.karlsson@...el.com, jonathan.lemon@...il.com,
maximmi@...dia.com, andrii@...nel.org
Subject: Re: [PATCH bpf-next 1/2] xsk: update rings for
load-acquire/store-release semantics
On 2021-03-01 17:08, Toke Høiland-Jørgensen wrote:
> Björn Töpel <bjorn.topel@...il.com> writes:
>
>> From: Björn Töpel <bjorn.topel@...el.com>
>>
>> Currently, the AF_XDP rings uses smp_{r,w,}mb() fences on the
>> kernel-side. By updating the rings for load-acquire/store-release
>> semantics, the full barrier on the consumer side can be replaced with
>> improved performance as a nice side-effect.
>>
>> Note that this change does *not* require similar changes on the
>> libbpf/userland side, however it is recommended [1].
>>
>> On x86-64 systems, by removing the smp_mb() on the Rx and Tx side, the
>> l2fwd AF_XDP xdpsock sample performance increases by
>> 1%. Weakly-ordered platforms, such as ARM64 might benefit even more.
>>
>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
>>
>> Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
>> ---
>> net/xdp/xsk_queue.h | 27 +++++++++++----------------
>> 1 file changed, 11 insertions(+), 16 deletions(-)
>>
>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
>> index 2823b7c3302d..e24279d8d845 100644
>> --- a/net/xdp/xsk_queue.h
>> +++ b/net/xdp/xsk_queue.h
>> @@ -47,19 +47,18 @@ struct xsk_queue {
>> u64 queue_empty_descs;
>> };
>>
>> -/* The structure of the shared state of the rings are the same as the
>> - * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
>> - * ring, the kernel is the producer and user space is the consumer. For
>> - * the Tx and fill rings, the kernel is the consumer and user space is
>> - * the producer.
>> +/* The structure of the shared state of the rings are a simple
>> + * circular buffer, as outlined in
>> + * Documentation/core-api/circular-buffers.rst. For the Rx and
>> + * completion ring, the kernel is the producer and user space is the
>> + * consumer. For the Tx and fill rings, the kernel is the consumer and
>> + * user space is the producer.
>> *
>> * producer consumer
>> *
>> - * if (LOAD ->consumer) { LOAD ->producer
>> - * (A) smp_rmb() (C)
>> + * if (LOAD ->consumer) { (A) LOAD.acq ->producer (C)
>
> Why is LOAD.acq not needed on the consumer side?
>
You mean why LOAD.acq is not needed on the *producer* side, i.e. the
->consumer? The ->consumer is a control dependency for the store, so
there is no ordering constraint for ->consumer at producer side. If
there's no space, no data is written. So, no barrier is needed there --
at least that has been my perspective.
This is very similar to the buffer in
Documentation/core-api/circular-buffers.rst. Roping in Paul for some
guidance.
Björn
> -Toke
>
Powered by blists - more mailing lists