[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <44CB8A67.3060801@redhat.com>
Date: Sat, 29 Jul 2006 09:18:47 -0700
From: Ulrich Drepper <drepper@...hat.com>
To: Evgeniy Polyakov <johnpol@....mipt.ru>
CC: Zach Brown <zach.brown@...cle.com>,
David Miller <davem@...emloft.net>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [RFC 1/4] kevent: core files.
Evgeniy Polyakov wrote:
> Btw, why do we want mapped ring of ready events?
> If user requestd some event, he definitely wants to get them back when
> they are ready, and not to check and then get them?
> Could you please explain more on this issue?
If of course makes no sense to enter the kernel to actually get the
event. This should be done by storing the event in the ring buffer.
I.e., there are two ways to get an event:
- with a syscall. This can report as many events at once as the caller
provides space for. And no event which is reported in the run buffer
should be reported this way
- if there is space, report it in the ring buffer. Yes, the buffer
can be optional, then all events are reported by the system call.
So the use case would be like this:
wait_and_get_event:
is buffer empty ?
yes -> make syscall
no -> get event from buffer
To avoid races, the syscall needs to take a parameter indicating the
last event checked out from the buffer. If in the meantime the kernel
put another event in the buffer the syscall immediately returns.
Similar to what we do in the futex syscall.
The question is how to best represent the ring buffer. Zach and some
others had some ready responses in Ottawa. The important thing is to
avoid cache line ping pong when possible.
Is the ring buffer absolutely necessary? Probably not. But it has the
potential to help quite a bit. Don't look at the problem to solve in
the context of heavy I/O operations when another syscall here and there
doesn't matter. With this single event mechanism for every possible
event the kernel can generate programming can look quite different.
E.g., every read() call can implicitly we changed into an async read
call followed by a user-level reschedule. This rescheduling allows
another thread of execution to run while the read request is processed.
I.e., it's basically a setjmp() followed by a goto into the inner loop
to get the next event. And now suddenly the event notification
mechanism really should be as fast as possible. If we submit basically
every request asynchronously and are not creating dedicated threads for
specific tasks anymore we
a) have a lot more event notifications
b) the probability of an event being reported when we want the receive
the next one if higher (i.e., the case where no syscall vs syscall
makes a difference)
Yes, all this will require changes in the way programs a written but we
shouldn't limit the way we can write programs unnecessarily. I think
that given increasing discrepancies in relative speed/latency of the
peripherals and the CPU this is one possible solution to keep the CPUs
busy without resorting to a gazillion separate threads in each program.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
Download attachment "signature.asc" of type "application/pgp-signature" (252 bytes)
Powered by blists - more mailing lists