[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070214180344.GI32271@kvack.org>
Date: Wed, 14 Feb 2007 13:03:44 -0500
From: Benjamin LaHaise <bcrl@...ck.org>
To: Davide Libenzi <davidel@...ilserver.org>
Cc: Russell King <rmk+lkml@....linux.org.uk>,
Ingo Molnar <mingo@...e.hu>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Arjan van de Ven <arjan@...radead.org>,
Christoph Hellwig <hch@...radead.org>,
Andrew Morton <akpm@....com.au>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Ulrich Drepper <drepper@...hat.com>,
Zach Brown <zach.brown@...cle.com>,
Evgeniy Polyakov <johnpol@....mipt.ru>,
"David S. Miller" <davem@...emloft.net>,
Suparna Bhattacharya <suparna@...ibm.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [patch 06/11] syslets: core, documentation
On Wed, Feb 14, 2007 at 09:52:20AM -0800, Davide Libenzi wrote:
> That'd be, instead of passing a chain of atoms, with the kernel
> interpreting conditions, and parameter lists, etc..., we let gcc
> do this stuff for us, and we pass the "clet" :) pointer to sys_async_exec,
> that exec the above under the same schedule-trapped environment, but in
> userspace. We setup a special userspace ad-hoc frame (ala signal), and we
> trap underneath task schedule attempt in the same way we do now.
> We setup the frame and when we return from sys_async_exec, we basically
> enter the "clet", that will return to a ret_from_async, that will return
> to userspace. Or, maybe we can support both. A simple single-syscall exec
> in the way we do now, and a clet way for the ones that requires chains and
> conditions. Hmmm?
Which is just the same as using threads. My argument is that once you
look at all the details involved, what you end up arriving at is the
creation of threads. Threads are relatively cheap, it's just that the
hardware currently has several performance bugs with them on x86 (and more
on x86-64 with the MSR fiddling that hits the hot path). Architectures
like powerpc are not going to benefit anywhere near as much from this
exercise, as the state involved is processed much more sanely. IA64 as
usual is simply doomed by way of having too many registers to switch.
If people really want to go down this path, please make an effort to compare
threads on a properly tuned platform. This means that things like the kernel
and userland stacks must take into account the cache alignment (we do some
of this already, but there are some very definate L1 cache colour collisions
between commonly hit data structures amongst threads). The existing AIO
ringbuffer suffers from this, as important data is always on the beginning
of the first page. Yes, these might be microoptimizations, but accumulated
changes of this nature have been known to buy 100%+ improvements in
performance.
-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@...ck.org>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists