netdev - Re: [PATCH net-next v3 0/4] Add support to do threaded napi busy poll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z6U-fubUytqdxRds@LQ3V64L9R2>
Date: Thu, 6 Feb 2025 14:58:06 -0800
From: Joe Damato <jdamato@...tly.com>
To: Samiullah Khawaja <skhawaja@...gle.com>
Cc: Martin Karsten <mkarsten@...terloo.ca>,
	Jakub Kicinski <kuba@...nel.org>,
	"David S . Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
	almasrymina@...gle.com, netdev@...r.kernel.org
Subject: Re: [PATCH net-next v3 0/4] Add support to do threaded napi busy poll

On Thu, Feb 06, 2025 at 02:49:08PM -0800, Samiullah Khawaja wrote:
> On Thu, Feb 6, 2025 at 5:42 AM Joe Damato <jdamato@...tly.com> wrote:
> >
> > On Wed, Feb 05, 2025 at 04:45:59PM -0800, Samiullah Khawaja wrote:
> > > On Wed, Feb 5, 2025 at 2:06 PM Joe Damato <jdamato@...tly.com> wrote:
> > > >
> > > > On Wed, Feb 05, 2025 at 12:35:00PM -0800, Samiullah Khawaja wrote:
> > > > > On Tue, Feb 4, 2025 at 5:32 PM Martin Karsten <mkarsten@...terloo.ca> wrote:
> > > > > >
> > > > > > On 2025-02-04 19:10, Samiullah Khawaja wrote:

[...]

> > > > > The processing of packets on a core and
> > > > > then going back to userspace to do application work (or protocol
> > > > > processing in case of onload) is not ok for this use case.
> > > >
> > > > Why is it not OK? I assume because there is too much latency? If
> > > > so... the data for that configuration should be provided so it can
> > > > be examined and compared.
> > > The time taken to do the application processing of the packets on the
> > > same core would take time away from the napi processing, introducing
> > > latency difference at tail with packets getting queued. Now for some
> > > use cases this would be acceptable, they can certainly set affinity of
> > > this napi thread equal to the userspace thread or maybe use
> > > epoll/recvmsg to drive it. For my use case, I want it to have a solid
> > > P90+ in sub 16us. A couple of microseconds spent doing application
> > > processing pushes it to 17-18us and that is unacceptable for my use
> > > case.
> >
> > Right, so the issue is that sharing a core induces latency which you
> > want to avoid.
> >
> > It seems like this data should be provided to highlight the concern?
> The 2 data points I provided are exactly that, Basically I am
> comparing 2 mechanisms of enabling busy polling with one (socket/epoll
> based) sharing a core (or doing work in sequence because of API
> design) and the other that drives napi in a separate thread (in my
> case also a separate core) independent of application. Different
> message sizes, number of sockets, hops between clients/server etc that
> would magnify the problem are all orthogonal issues that are
> irrelevant to this comparison I am trying to do here. Some of the
> points that you raised are certainly important, like the small value
> of interrupts being deferred and that maybe causes some interference
> with the socket/epoll based busypolling approach. But beyond that, I
> think the variety of experiments and results you are asking for might
> be interesting but are irrelevant to the scope of what I am proposing
> here,

With the utmost respect for the work, effort, and time you've put
into this in mind: I respectfully disagree in the strongest possible
terms.

Two data points (which lack significant documentation in the current
iteration of the cover letter) are not sufficient evidence of the
claim, especially when the claim is a >100x improvement.