lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrpLHogt9IyzNxXk@LQ3V64L9R2>
Date: Mon, 12 Aug 2024 18:49:18 +0100
From: Joe Damato <jdamato@...tly.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Christoph Hellwig <hch@...radead.org>, netdev@...r.kernel.org,
	mkarsten@...terloo.ca, amritha.nambiar@...el.com,
	sridhar.samudrala@...el.com, sdf@...ichev.me,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
	"open list:FILESYSTEMS (VFS and infrastructure)" <linux-fsdevel@...r.kernel.org>,
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC net-next 4/5] eventpoll: Trigger napi_busy_loop, if
 prefer_busy_poll is set

On Mon, Aug 12, 2024 at 05:17:38PM +0100, Matthew Wilcox wrote:
> On Mon, Aug 12, 2024 at 06:19:35AM -0700, Christoph Hellwig wrote:
> > On Mon, Aug 12, 2024 at 12:57:07PM +0000, Joe Damato wrote:
> > > From: Martin Karsten <mkarsten@...terloo.ca>
> > > 
> > > Setting prefer_busy_poll now leads to an effectively nonblocking
> > > iteration though napi_busy_loop, even when busy_poll_usecs is 0.
> > 
> > Hardcoding calls to the networking code from VFS code seems like
> > a bad idea.   Not that I disagree with the concept of disabling
> > interrupts during busy polling, but this needs a proper abstraction
> > through file_operations.
> 
> I don't understand what's going on with this patch set.  Is it just
> working around badly designed hardware?  NVMe is specified in a way that
> lets it be completely interruptless if the host is keeping up with the
> incoming completions from the device (ie the device will interrupt if a
> completion has been posted for N microseconds without being acknowledged).
> I assumed this was how network devices worked too, but I didn't check.

Thanks for taking a look. I'd kindly point you back to the cover
letter [1], which describes the purpose of the patch set in greater
detail.

At a high level: the networking stack has a mechanism for deferring
interrupts that was introduced in commit 6f8b12d661d0 ("net: napi:
add hard irqs deferral feature") and expanded upon in 7fd3253a7de6
("net: Introduce preferred busy-polling"). We are expanding the
existing mechanisms further so that when applications are busy
polling, IRQs are totally disabled.

While traditional NAPI does prevent IRQs from being re-enabled, it
runs in softIRQ context and only retrieves the available data at the
time NAPI poll runs. The kernel's pre-existing busy polling however,
lets the application drive packet processing (instead of NAPI). Busy
polling when used in combination with various options allowed user
applications to defer IRQs, but as we show in our cover letter it is
extremely difficult to pick the correct values for all traffic cases
at all times.

We are introducing an extension of the existing mechanism to allow
epoll-based busy poll applications to run more efficiently by
keeping IRQs disabled throughout the duration of successful busy
poll iterations.

- Joe

[1]: https://lore.kernel.org/netdev/20240812125717.413108-1-jdamato@fastly.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ