[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <F64155CD-D3BB-4730-8168-6F6A4C18193F@gmail.com>
Date: Mon, 10 Feb 2020 08:01:16 -0800
From: "Jonathan Lemon" <jonathan.lemon@...il.com>
To: "Magnus Karlsson" <magnus.karlsson@...el.com>
Cc: maximmi@...lanox.com, bjorn.topel@...el.com, ast@...nel.org,
daniel@...earbox.net, netdev@...r.kernel.org, rgoodfel@....edu,
bpf@...r.kernel.org, maciejromanfijalkowski@...il.com
Subject: Re: [PATCH bpf] xsk: publish global consumer pointers when NAPI is
finished
On 10 Feb 2020, at 7:27, Magnus Karlsson wrote:
> The commit 4b638f13bab4 ("xsk: Eliminate the RX batch size")
> introduced a much more lazy way of updating the global consumer
> pointers from the kernel side, by only doing so when running out of
> entries in the fill or Tx rings (the rings consumed by the
> kernel). This can result in a deadlock with the user application if
> the kernel requires more than one entry to proceed and the application
> cannot put these entries in the fill ring because the kernel has not
> updated the global consumer pointer since the ring is not empty.
>
> Fix this by publishing the local kernel side consumer pointer whenever
> we have completed Rx or Tx processing in the kernel. This way, user
> space will have an up-to-date view of the consumer pointers whenever it
> gets to execute in the one core case (application and driver on the
> same core), or after a certain number of packets have been processed
> in the two core case (application and driver on different cores).
>
> A side effect of this patch is that the one core case gets better
> performance, but the two core case gets worse. The reason that the one
> core case improves is that updating the global consumer pointer is
> relatively cheap since the application by definition is not running
> when the kernel is (they are on the same core) and it is beneficial
> for the application, once it gets to run, to have pointers that are
> as up to date as possible since it then can operate on more packets
> and buffers. In the two core case, the most important performance
> aspect is to minimize the number of accesses to the global pointers
> since they are shared between two cores and bounces between the caches
> of those cores. This patch results in more updates to global state,
> which means lower performance in the two core case.
>
> Fixes: 4b638f13bab4 ("xsk: Eliminate the RX batch size")
> Reported-by: Ryan Goodfellow <rgoodfel@....edu>
> Reported-by: Maxim Mikityanskiy <maximmi@...lanox.com>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@...el.com>
Acked-by: Jonathan Lemon <jonathan.lemon@...il.com>
Powered by blists - more mailing lists