[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y2phEZKYuSmPL5B5@fedora>
Date: Tue, 8 Nov 2022 09:00:49 -0500
From: Stefan Hajnoczi <stefanha@...hat.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCHSET v3 0/5] Add support for epoll min_wait
On Mon, Nov 07, 2022 at 02:38:52PM -0700, Jens Axboe wrote:
> On 11/7/22 1:56 PM, Stefan Hajnoczi wrote:
> > Hi Jens,
> > NICs and storage controllers have interrupt mitigation/coalescing
> > mechanisms that are similar.
>
> Yep
>
> > NVMe has an Aggregation Time (timeout) and an Aggregation Threshold
> > (counter) value. When a completion occurs, the device waits until the
> > timeout or until the completion counter value is reached.
> >
> > If I've read the code correctly, min_wait is computed at the beginning
> > of epoll_wait(2). NVMe's Aggregation Time is computed from the first
> > completion.
> >
> > It makes me wonder which approach is more useful for applications. With
> > the Aggregation Time approach applications can control how much extra
> > latency is added. What do you think about that approach?
>
> We only tested the current approach, which is time noted from entry, not
> from when the first event arrives. I suspect the nvme approach is better
> suited to the hw side, the epoll timeout helps ensure that we batch
> within xx usec rather than xx usec + whatever the delay until the first
> one arrives. Which is why it's handled that way currently. That gives
> you a fixed batch latency.
min_wait is fine when the goal is just maximizing throughput without any
latency targets.
The min_wait approach makes it hard to set a useful upper bound on
latency because unlucky requests that complete early experience much
more latency than requests that complete later.
Stefan
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists