lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2b55d220703042123l53403c12rac225189053e9a98@mail.gmail.com>
Date:	Sun, 4 Mar 2007 21:23:40 -0800
From:	"Michael K. Edwards" <medwards.linux@...il.com>
To:	"Kyle Moffett" <mrmacman_g4@....com>
Cc:	"Kirk Kuchov" <kirk.kuchov@...il.com>,
	"Davide Libenzi" <davidel@...ilserver.org>,
	"Evgeniy Polyakov" <johnpol@....mipt.ru>,
	"Ingo Molnar" <mingo@...e.hu>,
	"Eric Dumazet" <dada1@...mosbay.com>,
	"Pavel Machek" <pavel@....cz>, "Theodore Tso" <tytso@....edu>,
	"Linus Torvalds" <torvalds@...ux-foundation.org>,
	"Ulrich Drepper" <drepper@...hat.com>,
	"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
	"Arjan van de Ven" <arjan@...radead.org>,
	"Christoph Hellwig" <hch@...radead.org>,
	"Andrew Morton" <akpm@....com.au>,
	"Alan Cox" <alan@...rguk.ukuu.org.uk>,
	"Zach Brown" <zach.brown@...cle.com>,
	"David S. Miller" <davem@...emloft.net>,
	"Suparna Bhattacharya" <suparna@...ibm.com>,
	"Jens Axboe" <jens.axboe@...cle.com>,
	"Thomas Gleixner" <tglx@...utronix.de>
Subject: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

On 3/4/07, Kyle Moffett <mrmacman_g4@....com> wrote:
> Well, even this far into 2.6, Linus' patch from 2003 still (mostly)
> applies; the maintenance cost for this kind of code is virtually
> zilch.  If it matters that much to you clean it up and make it apply;
> add an alarmfd() syscall (another 100 lines of code at most?) and
> make a "read" return an architecture-independent siginfo-like
> structure and submit it for inclusion.  Adding epoll() support for
> random objects is as simple as a 75-line object-filesystem and a 25-
> line syscall to return an FD to a new inode.  Have fun!  Go wild!
> Something this trivially simple could probably spend a week in -mm
> and go to linus for 2.6.22.

Or, if you want to do slightly more work and produce something a great
deal more useful, you could implement additional netlink address
families for additional "event" sources.  The socket - setsockopt -
bind - sendmsg/recvmsg sequence is a well understood and well
documented UNIX paradigm for multiplexing non-blocking I/O to many
destinations over one socket.  Everyone who has read Stevens is
familiar with the basic UDP and "fd open server" techniques, and if
you look at Linux's IP_PKTINFO and NETLINK_W1 (bravo, Evgeniy!) you'll
see how easily they could be extended to file AIO and other kinds of
event sources.

For file AIO, you might have the application open one AIO socket per
mount point, open files indirectly via the SCM_RIGHTS mechanism, and
submit/retire read/write requests via sendmsg/recvmsg with ancillary
data consisting of an lseek64 tuple and a user-provided cookie.
Although the process still has to have one fd open per actual open
file (because trying to authenticate file accesses without opening fds
is madness), the only fds it has to manipulate directly are those
representing entire pools of outstanding requests.  This is usually a
small enough set that select() will do just fine, if you're careful
with fd allocation.  (You can simply punt indirectly opened fds up to
a high numerical range, where they can't be accessed directly from
userspace but still make fine cookies for use in lseek64 tuples within
cmsg headers).

The same basic approach will work for timers, signals, and just about
any other event source.  Userspace is of course still stuck doing its
own state machines / thread scheduling / however you choose to think
of it.  But all the important activity goes through socketcall(), and
the data and control parameters are all packaged up into a struct
msghdr instead of the bare buffer pointers of read/write.  So if
someone else does come along later and design an ultralight threading
mechanism that isn't a total botch, the actual data paths won't need
much rework; the exception handling will just get a lot simpler.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ