lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060726062817.GA20636@2ka.mipt.ru>
Date:	Wed, 26 Jul 2006 10:28:17 +0400
From:	Evgeniy Polyakov <johnpol@....mipt.ru>
To:	David Miller <davem@...emloft.net>
Cc:	drepper@...hat.com, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org
Subject: Re: async network I/O, event channels, etc

On Tue, Jul 25, 2006 at 03:01:22PM -0700, David Miller (davem@...emloft.net) wrote:
> From: Ulrich Drepper <drepper@...hat.com>
> Date: Tue, 25 Jul 2006 12:23:53 -0700
> 
> > I was very much surprised by the reactions I got after my OLS talk.
> > Lots of people declared interest and even agreed with the approach and
> > asked me to do further ahead with all this.  For those who missed it,
> > the paper and the slides are available on my home page:
> > 
> > http://people.redhat.com/drepper/
> > 
> > As for the next steps I see a number of possible ways.  The discussions
> > can be held on the usual mailing lists (i.e., lkml and netdev) but due
> > to the raw nature of the current proposal I would imagine that would be
> > mainly perceived as noise.
> 
> Since I gave a big thumbs up for Evgivny's kevent work yesterday
> on linux-kernel, you might want to start by comparing your work
> to his.  Because his has the advantage that 1) we have code now
> and 2) he has written many test applications and performed many
> benchmarks against his code which has flushed out most of the
> major implementation issues.
> 
> I think most of the people who have encouraged your work are unaware
> of Evgivny's kevent stuff, which is extremely unfortunate, the two
> works are more similar than they are different.
> 
> I do not think discussing all of this on netdev would be perceived
> as noise. :)

Hello David, Ulrich.

Here is brief description of what is kevent and how it works.

Kevent subsystem incorporates several AIO/kqueue design notes and ideas.
Kevent can be used both for edge and level notifications. It supports
socket notifications (accept, send, recv), network AIO (aio_send(),
aio_recv() and aio_sendfile()), inode notifications (create/remove),
generic poll()/select() notifications and timer notifications.

There are several object in the kevent system:
storage - each source of events (socket, inode, timer, aio, anything) has
	structure kevent_storage incorporated into it, which is basically a list
	of registered interests for this source of events.
user - it is abstraction which holds all requested kevents. It is
	similar to FreeBSD's kqueue.
kevent - set of interests for given source of events or storage.

When kevent is queued into storage, it will live there until removed by
kevent_dequeue(). When some activity is noticed in given storage, it
scans it's kevent_storage->list for kevents which match activity event.
If kevents are found and they are not already in the
kevent_user->ready_list, they will be added there at the end.

ioctl(WAIT) (or appropriate syscall) will wait until either requested
number of kevents are ready or timeout elapsed or at least one kevent is
ready, it's behaviour depends on parameters.

It is possible to have one-shot kevents, which are automatically removed
when are ready.

Any event can be added/removed/modified by ioctl or special controlling
syscall.

Network AIO is based on kevent and works as usual kevent storage on top
of inode.
When new socket is created it is associated with that inode and when
some activity is detected appropriate notifications are generated and
kevent_naio_callback() is called.
When new kevent is being registered, network AIO ->enqueue() callback
simply marks itself like usual socket event watcher. It also locks
physical userspace pages in memory and stores appropriate pointers in
private kevent structure. I have not created additional DMA memory
allocation methods, like Ulrich described in his article, so I handle it
inside NAIO which has some overhead (I posted get_user_pages()
sclability graph some time ago).
Network AIO callback gets pointers to userspace pages and tries to copy
data from receiving skb queue into them using protocol specific
callback. This callback is very similar to ->recvmsg(), so they could
share a lot in future (as far as I recall it worked only with hardware
capable to do checksumming, I'm a bit lazy).

Both network and aio implementation work on top of hooks inside
appropriate state machines, but not as repeated call design (curect AIO) 
or special thread (SGI AIO). AIO work was stopped, since I was unable to 
achieve the same speed as synchronous read 
(maximum speeds were 2Gb/sec vs. 2.1 GB/sec for aio and sync IO accordingly
when reading data from the cache).
Network aio_sendfile() works lazily - it asynchronously populates pages
into the VFS cache (which can be used for various tricks with adaptive
readahead) and then uses usual ->sendfile() callback.

I have not created an interface for userspace events (like Solaris), 
since right now I do not see it's usefullness, but if there is
requirements for that it is quite easy with kevents.

I'm preparing set of kevent patches resend (with cleanups mentioned in
previous e-mails), which will be ready in a couple of moments.

1. kevent homepage.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent

2. network aio homepage.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=naio

3. LWN.net published a very good article about kevent.
http://lwn.net/Articles/172844/

Thank you.

-- 
	Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ