linux-kernel - Re: [patch 06/11] syslets: core, documentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0702131457060.32055@alien.or.mcafeemobile.com>
Date:	Tue, 13 Feb 2007 15:21:14 -0800 (PST)
From:	Davide Libenzi <davidel@...ilserver.org>
To:	Ingo Molnar <mingo@...e.hu>
cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	Christoph Hellwig <hch@...radead.org>,
	Andrew Morton <akpm@....com.au>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Ulrich Drepper <drepper@...hat.com>,
	Zach Brown <zach.brown@...cle.com>,
	Evgeniy Polyakov <johnpol@....mipt.ru>,
	"David S. Miller" <davem@...emloft.net>,
	Benjamin LaHaise <bcrl@...ck.org>,
	Suparna Bhattacharya <suparna@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [patch 06/11] syslets: core, documentation

On Tue, 13 Feb 2007, Ingo Molnar wrote:

> 
> * Davide Libenzi <davidel@...ilserver.org> wrote:
> 
> > > +The Syslet Atom:
> > > +----------------
> > > +
> > > +The syslet atom is a small, fixed-size (44 bytes on 32-bit) piece of
> > > +user-space memory, which is the basic unit of execution within the syslet
> > > +framework. A syslet represents a single system-call and its arguments.
> > > +In addition it also has condition flags attached to it that allows the
> > > +construction of larger programs (syslets) from these atoms.
> > > +
> > > +Arguments to the system call are implemented via pointers to arguments.
> > > +This not only increases the flexibility of syslet atoms (multiple syslets
> > > +can share the same variable for example), but is also an optimization:
> > > +copy_uatom() will only fetch syscall parameters up until the point it
> > > +meets the first NULL pointer. 50% of all syscalls have 2 or less
> > > +parameters (and 90% of all syscalls have 4 or less parameters).
> > 
> > Why do you need to have an extra memory indirection per parameter in 
> > copy_uatom()? [...]
> 
> yes. Try to use them in real programs, and you'll see that most of the 
> time the variable an atom wants to access should also be accessed by 
> other atoms. For example a socket file descriptor - one atom opens it, 
> another one reads from it, a third one closes it. By having the 
> parameters in the atoms we'd have to copy the fd to two other places.

Yes, of course we have to support the indirection, otherwise chaining 
won't work. But ...



> > I can understand that chaining syscalls requires variable sharing, but 
> > the majority of the parameters passed to syscalls are just direct 
> > ones. Maybe a smart method that allows you to know if a parameter is a 
> > direct one or a pointer to one? An "unsigned int pmap" where bit N is 
> > 1 if param N is an indirection? Hmm?
> 
> adding such things tends to slow down atom parsing.

I really think it simplifies it. You simply *copy* the parameter (I'd say 
that 70+% of cases falls inside here), and if the current "pmap" bit is 
set, then you do all the indirection copy-from-userspace stuff.
It also simplify userspace a lot, since you can now pass arrays and 
structure pointers directly, w/out saving them in a temporary variable.




> > Sigh, I really dislike shared userspace/kernel stuff, when we're 
> > transfering pointers to userspace. Did you actually bench it against 
> > a:
> > 
> > int async_wait(struct syslet_uatom **r, int n);
> > 
> > I can fully understand sharing userspace buffers with the kernel, if 
> > we're talking about KB transferd during a block or net I/O DMA 
> > operation, but for transfering a pointer? Behind each pointer 
> > transfer(4/8 bytes) there is a whole syscall execution, [...]
> 
> there are three main reasons for this choice:
> 
> - firstly, by putting completion events into the user-space ringbuffer
>   the asynchronous contexts are not held up at all, and the threads are
>   available for further syslet use.
> 
> - secondly, it was the most obvious and simplest solution to me - it 
>   just fits well into the syslet model - which is an execution concept 
>   centered around pure user-space memory and system calls, not some 
>   kernel resource. Kernel fills in the ringbuffer, user-space clears it. 
>   If we had to worry about a handshake between user-space and 
>   kernel-space for the completion information to be passed along, that 
>   would either mean extra buffering or extra overhead. Extra buffering 
>   (in the kernel) would be for no good reason: why not buffer it in the 
>   place where the information is destined for in the first place. The 
>   ringbuffer of /pointers/ is what makes this really powerful. I never 
>   really liked the AIO/etc. method /event buffer/ rings. With syslets 
>   the 'cookie' is the pointer to the syslet atom itself. It doesnt get 
>   any more straightforward than that i believe.
> 
> - making 'is there more stuff for me to work on' a simple instruction in
>   user-space makes it a no-brainer for user-space to promptly and
>   without thinking complete events. It's also the right thing to do on 
>   SMP: if one core is solely dedicated to the asynchronous workload,
>   only running on kernel mode, and the other code is only running
>   user-space, why ever switch between protection domains? [except if any
>   of them is idle] The fastest completion signalling method is the
>   /memory bus/, not an interrupt. User-space could in theory even use
>   MWAIT (in user-space!) to wait for the other core to complete stuff. 
>   That makes for a hell of a fast wakeup.

That makes also for a hell ugly retrieval API IMO ;)
If it'd be backed up but considerable performance gains, then it might be OK.
But I believe it won't be the case, and that leave us with an ugly API.
OTOH, if noone else object this, it means that I'm the only wierdo :) and 
the API is just fine.




- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/