lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZsB02OnFM1IhLkAt@LQ3V64L9R2>
Date: Sat, 17 Aug 2024 11:00:56 +0100
From: Joe Damato <jdamato@...tly.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Martin Karsten <mkarsten@...terloo.ca>,
	Samiullah Khawaja <skhawaja@...gle.com>,
	Stanislav Fomichev <sdf@...ichev.me>, netdev@...r.kernel.org,
	amritha.nambiar@...el.com, sridhar.samudrala@...el.com,
	Alexander Lobakin <aleksander.lobakin@...el.com>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Breno Leitao <leitao@...ian.org>,
	Christian Brauner <brauner@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Jan Kara <jack@...e.cz>,
	Jiri Pirko <jiri@...nulli.us>,
	Johannes Berg <johannes.berg@...el.com>,
	Jonathan Corbet <corbet@....net>,
	"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
	"open list:FILESYSTEMS (VFS and infrastructure)" <linux-fsdevel@...r.kernel.org>,
	open list <linux-kernel@...r.kernel.org>,
	Lorenzo Bianconi <lorenzo@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [RFC net-next 0/5] Suspend IRQs during preferred busy poll

On Fri, Aug 16, 2024 at 01:01:42PM -0400, Willem de Bruijn wrote:
> Joe Damato wrote:
> > On Fri, Aug 16, 2024 at 10:59:51AM -0400, Willem de Bruijn wrote:
[...]
> > > Willem de Bruijn wrote:
> > > If the only goal is to safely reenable interrupts when the application
> > > stops calling epoll_wait, does this have to be user tunable?
> > > 
> > > Can it be either a single good enough constant, or derived from
> > > another tunable, like busypoll_read.
> > 
> > I believe you meant busy_read here, is that right?
> > 
> > At any rate:
> > 
> >   - I don't think a single constant is appropriate, just as it
> >     wasn't appropriate for the existing mechanism
> >     (napi_defer_hard_irqs/gro_flush_timeout), and
> > 
> >   - Deriving the value from a pre-existing parameter to preserve the
> >     ABI, like busy_read, makes using this more confusing for users
> >     and complicates the API significantly.
> > 
> > I agree we should get the API right from the start; that's why we've
> > submit this as an RFC ;)
> > 
> > We are happy to take suggestions from the community, but, IMHO,
> > re-using an existing parameter for a different purpose only in
> > certain circumstances (if I understand your suggestions) is a much
> > worse choice than adding a new tunable that clearly states its
> > intended singular purpose.
> 
> Ack. I was thinking whether an epoll flag through your new epoll
> ioctl interface to toggle the IRQ suspension (and timer start)
> would be preferable. Because more fine grained.

I understand why you are asking about this and I think it would be
great if this were possible, but unfortunately it isn't.

epoll contexts can be associated with any NAPI ID, but the IRQ
suspension is NAPI specific.

As an example: imagine a user program creates an epoll context and
adds fds with NAPI ID 1111 to the context. It then issues the ioctl
to set the suspend timeout for that context. Then, for whatever
reason, the user app decides to remove all the fds and add new ones,
this time from NAPI ID 2222, which happens to be a different
net_device.

What does that mean for the suspend timeout? It's not clear to me
what the right behavior would be in this situation (does it persist?
does it get cleared when a new NAPI ID is added? etc) and it makes
the user API much more complicated, with many more edge cases and
possible bugs.

> Also, the value is likely dependent more on the expected duration
> of userspace processing? If so, it would be the same for all
> devices, so does a per-netdev value make sense?

There is presently no way to set values like gro_timeout,
defer_hard_irqs, or this new proposed value on a per-NAPI basis.
IMHO, that is really where all of these values should live.

I mentioned on the list previously (and also in the cover letter),
that time permitting, I think the correct evolution of this would be
to support per-NAPI settings (via netdev-genl, I assume) so that
user programs can set all 3 values on only the NAPIs they care
about.

Until that functionality is available, it would seem per-netdev is
the only way for this feature to be added at the present time. I
simply haven't had the time to add the above interface. This
feature we're proposing has demonstrable performance value, but it
doesn't seem sensible to block it until time permits me to add a
per-NAPI interface for all of these values given that we already
globally expose 2 of them.

That said, I appreciate the thoughtfulness of your replies and I am
open to other suggestions.

- Joe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ