[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrpLHogt9IyzNxXk@LQ3V64L9R2>
Date: Mon, 12 Aug 2024 18:49:18 +0100
From: Joe Damato <jdamato@...tly.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Christoph Hellwig <hch@...radead.org>, netdev@...r.kernel.org,
mkarsten@...terloo.ca, amritha.nambiar@...el.com,
sridhar.samudrala@...el.com, sdf@...ichev.me,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
"open list:FILESYSTEMS (VFS and infrastructure)" <linux-fsdevel@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC net-next 4/5] eventpoll: Trigger napi_busy_loop, if
prefer_busy_poll is set
On Mon, Aug 12, 2024 at 05:17:38PM +0100, Matthew Wilcox wrote:
> On Mon, Aug 12, 2024 at 06:19:35AM -0700, Christoph Hellwig wrote:
> > On Mon, Aug 12, 2024 at 12:57:07PM +0000, Joe Damato wrote:
> > > From: Martin Karsten <mkarsten@...terloo.ca>
> > >
> > > Setting prefer_busy_poll now leads to an effectively nonblocking
> > > iteration though napi_busy_loop, even when busy_poll_usecs is 0.
> >
> > Hardcoding calls to the networking code from VFS code seems like
> > a bad idea. Not that I disagree with the concept of disabling
> > interrupts during busy polling, but this needs a proper abstraction
> > through file_operations.
>
> I don't understand what's going on with this patch set. Is it just
> working around badly designed hardware? NVMe is specified in a way that
> lets it be completely interruptless if the host is keeping up with the
> incoming completions from the device (ie the device will interrupt if a
> completion has been posted for N microseconds without being acknowledged).
> I assumed this was how network devices worked too, but I didn't check.
Thanks for taking a look. I'd kindly point you back to the cover
letter [1], which describes the purpose of the patch set in greater
detail.
At a high level: the networking stack has a mechanism for deferring
interrupts that was introduced in commit 6f8b12d661d0 ("net: napi:
add hard irqs deferral feature") and expanded upon in 7fd3253a7de6
("net: Introduce preferred busy-polling"). We are expanding the
existing mechanisms further so that when applications are busy
polling, IRQs are totally disabled.
While traditional NAPI does prevent IRQs from being re-enabled, it
runs in softIRQ context and only retrieves the available data at the
time NAPI poll runs. The kernel's pre-existing busy polling however,
lets the application drive packet processing (instead of NAPI). Busy
polling when used in combination with various options allowed user
applications to defer IRQs, but as we show in our cover letter it is
extremely difficult to pick the correct values for all traffic cases
at all times.
We are introducing an extension of the existing mechanism to allow
epoll-based busy poll applications to run more efficiently by
keeping IRQs disabled throughout the duration of successful busy
poll iterations.
- Joe
[1]: https://lore.kernel.org/netdev/20240812125717.413108-1-jdamato@fastly.com/
Powered by blists - more mailing lists