lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAywjhTAvT+LPT8_saw41vV6SE+EWd-2gzCH1iP_0HOvdi=yEg@mail.gmail.com>
Date: Wed, 8 Jan 2025 13:18:48 -0800
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Joe Damato <jdamato@...tly.com>, Jakub Kicinski <kuba@...nel.org>, 
	Samiullah Khawaja <skhawaja@...gle.com>, "David S . Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, 
	mkarsten@...terloo.ca
Subject: Re: [PATCH net-next 0/3] Add support to do threaded napi busy poll

On Wed, Jan 8, 2025 at 11:25 AM Joe Damato <jdamato@...tly.com> wrote:
>
> On Thu, Jan 02, 2025 at 04:47:14PM -0800, Jakub Kicinski wrote:
> > On Thu,  2 Jan 2025 19:12:24 +0000 Samiullah Khawaja wrote:
> > > Extend the already existing support of threaded napi poll to do continuous
> > > busypolling.
> > >
> > > This is used for doing continuous polling of napi to fetch descriptors from
> > > backing RX/TX queues for low latency applications. Allow enabling of threaded
> > > busypoll using netlink so this can be enabled on a set of dedicated napis for
> > > low latency applications.
> >
> > This is lacking clear justification and experimental results
> > vs doing the same thing from user space.
Thanks for the response.

The major benefit is that this is a one common way to enable busy
polling of descriptors on a particular napi. It is basically
independent of the userspace API and allows for enabling busy polling
on a subset, out of the complete list, of napi instances in a device
that can be shared among multiple processes/applications that have low
latency requirements. This allows for a dedicated subset of napi
instances that are configured for busy polling on a machine and
workload/jobs can target these napi instances.

Once enabled, the relevant kthread can be queried using netlink
`get-napi` op. The thread priority, scheduler and any thread core
affinity can also be set. Any userspace application using a variety of
interfaces (AF_XDP, io_uring, xsk, epoll etc) can run on top of it
without any further complexity. For userspace driven napi busy
polling, one has to either use sysctls to setup busypolling that are
done at device level or use different interfaces depending on the use
cases,
- epoll params (or a sysctl that is system wide) for epoll based interface
- socket option (or sysctl that is system wide) for sk_recvmsg
- io_uring (I believe SQPOLL can be configured with it)

Our application for this feature uses a userspace implementation of
TCP (https://github.com/Xilinx-CNS/onload) that interfaces with AF_XDP
to send/receive packets. We use neper (running with AF_XDP + userspace
TCP library) to measure latency improvements with and without napi
threaded busy poll. Our target application sends packets with a well
defined frequency with a couple of 100 bytes of RPC style
request/response.

Test Environment:
Google C3 VMs running netdev-net/main kernel with idpf driver

Without napi threaded busy poll (p50 at around 44us)
num_transactions=47918
latency_min=0.000018838
latency_max=0.333912365
latency_mean=0.000189570
latency_stddev=0.005859874
latency_p50=0.000043510
latency_p90=0.000053750
latency_p99=0.000058230
latency_p99.9=0.000184310

With napi threaded busy poll (p50 around 14us)
latency_min=0.000012271
latency_max=0.209365389
latency_mean=0.000021611
latency_stddev=0.001166541
latency_p50=0.000013590
latency_p90=0.000019990
latency_p99=0.000023670
latency_p99.9=0.000027830

> Apologies for chiming in late here as I was out of the office, but I
> agree with Jakub and Stanislav:
Thanks for chiming in.
>
> - This lacks clear justification and data to compare packet delivery
>   mechanisms. IMHO, at a minimum a real world application should be
>   benchmarked and various packet delivery mechanisms (including this
>   one) should be compared side-by-side. You don't need to do exactly
>   what Martin and I did [1], but I'd offer that as a possible
>   suggestion for how you might proceed if you need some suggestions.
Some of the packet delivery mechanisms like epoll can only be compared
with this using the application that uses it. This napi threaded
approach provides support for enabling busy polling of a napi
regardless of the userspace API the application uses. For example our
target application uses AF_XDP directly and interfaces with rings
directly.
>
> - This should include a test of some sort; perhaps expanding the test
>   I added (as Stanislav suggested) would be a good start?
Thanks for the suggestion. I am currently expanding the test you added
with this (as Stanislav suggested). I will be sending an update with
it.
>
> - IMHO, this change should also include updated kernel documentation
>   to clearly explain how, when, and why a user might use this and
>   what tradeoffs a user can expect. The commit message is, IMHO, far
>   too vague.
>
>   Including example code snippets or ynl invocations etc in the
>   kernel documentation would be very helpful.
Thanks for the suggestion. I will add those in the next update.
>
> > I'd also appreciate if Google could share the experience and results
> > of using basic threaded NAPI _in production_.
We are not using basic threaded NAPI _in production_, but we are going
to use this threaded napi busy poll in production for one of the use
cases. We are currently improving neper to accurately do the network
traffic simulation and handle the frequency of request/response
properly.
>
> +1; this data would be very insightful.
>
> [1]: https://lore.kernel.org/netdev/20241109050245.191288-1-jdamato@fastly.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ