[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <9BE4FD9B-5829-46D1-B9BA-B475261A4116@oracle.com>
Date: Tue, 6 Feb 2007 17:28:31 -0500
From: Zach Brown <zach.brown@...cle.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: David Miller <davem@...emloft.net>, kent.overstreet@...il.com,
davidel@...ilserver.org, mingo@...e.hu,
linux-kernel@...r.kernel.org, linux-aio@...ck.org,
suparna@...ibm.com, bcrl@...ck.org
Subject: Re: [PATCH 2 of 4] Introduce i386 fibril scheduling
> That's not how the patches work right now, but yes, I at least
> personally
> think that it's something we should aim for (ie the interface
> shouldn't
> _require_ us to always wait for things even if perhaps an early
> implementation might make everything be delayed at first)
I agree that we shouldn't require a seperate syscall just to get the
return code from ops that didn't block.
It doesn't seem like much of a stretch to imagine a setup where we
can specify completion context as part of the submission itself.
declare_empty_ring(ring);
struct submission sub;
sub.ring = ˚
sub.nr = SYS_fstat64;
sub.args == ...
ret = submit(&sub, 1);
if (ret == 0) {
wait_for_elements(&ring, 1);
printf("stat gave %d\n", ring[ring->head].rc);
}
You get the idea, it's just an outline.
wait_for_elements() could obviously check the ring before falling
back to kernel sync. I'm pretty keen on the notion of producer/
consumer rings where userspace writes the head as it plucks
completions and the kernel writes the tail as it adds them.
We might want per-call ring pointers, instead of per submission, to
help submitters wait for a group of ops to complete without having to
do their own tracking on event completion. That only makes sense if
we have the waiting mechanics let you only be woken as the number of
events in the ring crosses some threshold. Which I think we want
anyway.
We'd be trading building up a specific completion state with syscalls
for some complexity during submission that pins (and kmaps on
completion) the user pages. Submission could return failure if
pinning these new pages would push us over some rlimit. We'd have to
be *awfully* careful not to let userspace corrupt (munmap?) the ring
and confuse the hell out of the kernel.
Maybe not worth it, but if we *really* cared about making the non-
blocking case almost identical to the sync case and wanted to use the
same interface for batch submission and async completion then this
seems like a possibility.
- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists