netdev - Re: [GIT PULL] Add support for epoll min wait time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whgzBzTR5t6Dc6gZ_XS1q=UrqeiBf62op_fahbwns+xvQ@mail.gmail.com>
Date:   Sat, 10 Dec 2022 10:51:54 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Jens Axboe <axboe@...nel.dk>
Cc:     netdev <netdev@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
Subject: Re: [GIT PULL] Add support for epoll min wait time

On Sat, Dec 10, 2022 at 7:36 AM Jens Axboe <axboe@...nel.dk> wrote:
>
> This adds an epoll_ctl method for setting the minimum wait time for
> retrieving events.

So this is something very close to what the TTY layer has had forever,
and is useful (well... *was* useful) for pretty much the same reason.

However, let's learn from successful past interfaces: the tty layer
doesn't have just VTIME, it has VMIN too.

And I think they very much go hand in hand: you want for at least VMIN
events or for at most VTIME after the last event.

Yes, yes, you have that 'maxevents' thing, but that's not at all the
same as VMIN. That's just the buffer size.

Also note that the tty layer VTIME is *different* from what I think
your "minimum wait time" is. VTIME is a "inter event timer", not a
"minimum total time". If new events keep on coming, the timer resets -
until either things time out, or you hit VMIN events.

I get the feeling that the tty layer did this right, and this epoll
series did not. The tty model certainly feels more flexible, and does
have decades of experience. tty traffic *used* to be just about the
lowest-latency traffic machines handled back when, so I think it might
be worth looking at as a model.

So I get the feeling that if you are adding some new "timeout for
multiple events" model to epoll, you should look at previous users.

And btw, the tty layer most definitely doesn't handle every possible case.

There are at least three different valid timeouts:

 (a) the "final timeout" that epoll already has (ie "in no case wait
more than this, even if there are no events")

 (b) the "max time we wait if we have at least one event" (your new "min_wait")

 (c) the "inter-event timeout" (tty layer VTIME)

and in addition to the timers, there's that whole "if I have gotten X
events, I have enough, so stop timing out" (tty layer VMIN).

And again, that "at least X events" should not be "this is my buffer
size". You may well want to have a *big* buffer for when there are
events queued up or the machine is just under very heavy load, but may
well feel like "if I got N events, I have enough to deal with, and
don't want to time out for any more".

Now, maybe there is some reason why the tty like VMIN/VTIME just isn't
relevant, but I do think that people have successfully used VMIN/VTIME
for long enough that it should be at least given some thought.

Terminal traffic may not be very relevant any more as a hard load to
deal with well. But it really used to be very much an area that had to
balance both throughput and latency concerns and had exactly the kinds
of issues you describe (ie "returning after one single character is
*much* too inefficient").

Hmm?

              Linus