[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e4702dd2-3968-4740-aa50-9f7cda3bb13e@iogearbox.net>
Date: Tue, 11 Feb 2020 15:52:37 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Magnus Karlsson <magnus.karlsson@...el.com>, maximmi@...lanox.com,
bjorn.topel@...el.com, ast@...nel.org, netdev@...r.kernel.org,
jonathan.lemon@...il.com
Cc: rgoodfel@....edu, bpf@...r.kernel.org,
maciejromanfijalkowski@...il.com
Subject: Re: [PATCH bpf] xsk: publish global consumer pointers when NAPI is
finished
On 2/10/20 4:27 PM, Magnus Karlsson wrote:
> The commit 4b638f13bab4 ("xsk: Eliminate the RX batch size")
> introduced a much more lazy way of updating the global consumer
> pointers from the kernel side, by only doing so when running out of
> entries in the fill or Tx rings (the rings consumed by the
> kernel). This can result in a deadlock with the user application if
> the kernel requires more than one entry to proceed and the application
> cannot put these entries in the fill ring because the kernel has not
> updated the global consumer pointer since the ring is not empty.
>
> Fix this by publishing the local kernel side consumer pointer whenever
> we have completed Rx or Tx processing in the kernel. This way, user
> space will have an up-to-date view of the consumer pointers whenever it
> gets to execute in the one core case (application and driver on the
> same core), or after a certain number of packets have been processed
> in the two core case (application and driver on different cores).
>
> A side effect of this patch is that the one core case gets better
> performance, but the two core case gets worse. The reason that the one
> core case improves is that updating the global consumer pointer is
> relatively cheap since the application by definition is not running
> when the kernel is (they are on the same core) and it is beneficial
> for the application, once it gets to run, to have pointers that are
> as up to date as possible since it then can operate on more packets
> and buffers. In the two core case, the most important performance
> aspect is to minimize the number of accesses to the global pointers
> since they are shared between two cores and bounces between the caches
> of those cores. This patch results in more updates to global state,
> which means lower performance in the two core case.
>
> Fixes: 4b638f13bab4 ("xsk: Eliminate the RX batch size")
> Reported-by: Ryan Goodfellow <rgoodfel@....edu>
> Reported-by: Maxim Mikityanskiy <maximmi@...lanox.com>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com>
Applied, thanks!
Powered by blists - more mailing lists