netdev - Re: [PATCHSET v3 0/5] Add support for epoll min

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4281b354-d67d-2883-d966-a7816ed4f811@kernel.dk>
Date:   Mon, 7 Nov 2022 14:38:52 -0700
From:   Jens Axboe <axboe@...nel.dk>
To:     Stefan Hajnoczi <stefanha@...hat.com>
Cc:     linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCHSET v3 0/5] Add support for epoll min_wait

On 11/7/22 1:56 PM, Stefan Hajnoczi wrote:
> Hi Jens,
> NICs and storage controllers have interrupt mitigation/coalescing
> mechanisms that are similar.

Yep

> NVMe has an Aggregation Time (timeout) and an Aggregation Threshold
> (counter) value. When a completion occurs, the device waits until the
> timeout or until the completion counter value is reached.
> 
> If I've read the code correctly, min_wait is computed at the beginning
> of epoll_wait(2). NVMe's Aggregation Time is computed from the first
> completion.
> 
> It makes me wonder which approach is more useful for applications. With
> the Aggregation Time approach applications can control how much extra
> latency is added. What do you think about that approach?

We only tested the current approach, which is time noted from entry, not
from when the first event arrives. I suspect the nvme approach is better
suited to the hw side, the epoll timeout helps ensure that we batch
within xx usec rather than xx usec + whatever the delay until the first
one arrives. Which is why it's handled that way currently. That gives
you a fixed batch latency.

-- 
Jens Axboe