[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6f703f960702061445q23dd9d48q7afec75d2400ef62@mail.gmail.com>
Date: Tue, 6 Feb 2007 13:45:50 -0900
From: "Kent Overstreet" <kent.overstreet@...il.com>
To: "Linus Torvalds" <torvalds@...ux-foundation.org>
Cc: "Davide Libenzi" <davidel@...ilserver.org>,
"Zach Brown" <zach.brown@...cle.com>,
"Ingo Molnar" <mingo@...e.hu>,
"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
linux-aio@...ck.org, "Suparna Bhattacharya" <suparna@...ibm.com>,
"Benjamin LaHaise" <bcrl@...ck.org>
Subject: Re: [PATCH 2 of 4] Introduce i386 fibril scheduling
On 2/6/07, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Mon, 5 Feb 2007, Kent Overstreet wrote:
> >
> > struct asys_ret {
> > int ret;
> > struct thread *p;
> > };
> >
> > struct asys_ret r;
> > r.p = me;
> >
> > async_read(fd, buf, nbytes, &r);
>
> That's horrible. It means that "r" cannot have automatic linkage (since
> the stack will be *gone* by the time we need to fill in "ret"), so now you
> need to track *two* pointers: "me" and "&r".
You'd only allocate r on the stack if that stack is going to be around
later; i.e. if you're using user threads. Otherwise, you just allocate
it in some struct containing your aiocb or whatever.
> And for user space, it means that we pass the _one_ thing around that we
> need for both identifying the async operation to the kernel (the "cookie")
> for wait or cancel, and the place where we expect the return value to be
> found (which in turn can _easily_ represent a whole "struct aiocb *",
> since the return value obviously has to be embedded in there anyway).
>
> Linus
The "struct aiocb" isn't something you have to or necessarily want to
keep around. It's the way the current aio interface works (which I've
coded to), but I don't really see the point. All it really contains is
the syscall arguments, but once the syscall's in progress there's no
reason the kernel has to refer back to it; similarly for userspace,
it's just another struct that userspace has to keep track of and free
at some later time.
In fact, that's the only sane way you can have a ring for submitted
system calls, as otherwise elements of the ring are getting freed in
essentially random order.
I don't see the point in having a ring for completed events, since
it's at most two pointers per completion; quite a bit less data being
sent back than for submissions.
-----
The trouble with differentiating between calls that block and calls
that don't is you completely loose the ability to batch syscalls
together; this is potentially a major win of an asynchronous
interface.
An app can have a bunch of cheap, fast user space threads servicing
whatever; as they run, they can push their system calls onto a global
stack. When no more can run, it does a giant asys_submit (something
similar to io_submit), then the io_getevents equivilant, running the
user threads that had their syscalls complete.
This doesn't mean you can't run synchronously the syscalls that
wouldn't block, or that you have to allocate a fibril for every
syscall - but for servers that care more about throughput than
latency, this is potentially a big win, in cache effects if nothing
else.
(And this doesn't prevent you from having a different syscall that
submits an asynchronous syscall, but runs it right away if it was able
to without blocking).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists