[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070530090537.GB17744@elte.hu>
Date: Wed, 30 May 2007 11:05:37 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Evgeniy Polyakov <johnpol@....mipt.ru>
Cc: Ulrich Drepper <drepper@...hat.com>, Jeff Garzik <jeff@...zik.org>,
Zach Brown <zach.brown@...cle.com>,
linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Arjan van de Ven <arjan@...radead.org>,
Christoph Hellwig <hch@...radead.org>,
Andrew Morton <akpm@....com.au>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
"David S. Miller" <davem@...emloft.net>,
Suparna Bhattacharya <suparna@...ibm.com>,
Davide Libenzi <davidel@...ilserver.org>,
Jens Axboe <jens.axboe@...cle.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Syslets, Threadlets, generic AIO support, v6
* Evgeniy Polyakov <johnpol@....mipt.ru> wrote:
> On Wed, May 30, 2007 at 10:42:52AM +0200, Ingo Molnar (mingo@...e.hu) wrote:
> > it is a serious flexibility issue that should not be ignored. The
> > unified fd space is a blessing on one hand because it's simple and
> > powerful, but it's also a curse because nested use of the fd space for
> > libraries is currently not possible. But it should be detached from any
> > fundamental question of kevent vs. epoll. (By improving library use of
> > file descriptors we'll improve the utility of all syscalls - by ducking
> > to a memory based API we only solve that particular event based usage.)
>
> There is another issue with file descriptors - userspace must dig into
> kernel each time it wants to get a new set of events, while with
> memory based approach it has them without doing so. After it has
> returned from kernel and know that there are some evetns, kernel can
> add more of them into the ring (if there is a place) and userspace
> will process them withouth additional syscalls.
Firstly, this is not a fundamental property of epoll. If we wanted to,
it would be possible to extend epoll to fill in a ring of events from
the wakeup handler. It's an incremental add-on to epoll that should not
impact the design. How much info to put into a single event is another
incremental thing - for most of the high-performance cases all the
information we need is the type of the event and the fd it occured on.
Currently epoll supports that minimal approach.
Secondly, our current syscall overhead is below 0.1 usecs on latest
hardware:
dione:~/l> ./lat_syscall null
Simple syscall: 0.0911 microseconds
so you need millions of events _per cpu_ for the syscall overhead to
show up.
Thirdly, our main problem was not the structure of epoll, our main
problem was that event APIs were not widely available, so applications
couldnt go to a pure event based design - they always had to handle
certain types of event domains specially, due to lack of coverage. The
latest epoll patches largely address that. This was a huge barrier
against adoption of epoll.
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists