[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200801091448.46241.rusty@rustcorp.com.au>
Date: Wed, 9 Jan 2008 14:48:44 +1100
From: Rusty Russell <rusty@...tcorp.com.au>
To: Zach Brown <zach.brown@...cle.com>
Cc: linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>,
Ulrich Drepper <drepper@...hat.com>,
Arjan van de Ven <arjan@...radead.org>,
Andrew Morton <akpm@....com.au>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Evgeniy Polyakov <johnpol@....mipt.ru>,
"David S. Miller" <davem@...emloft.net>,
Suparna Bhattacharya <suparna@...ibm.com>,
Davide Libenzi <davidel@...ilserver.org>,
Jens Axboe <jens.axboe@...cle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Dan Williams <dan.j.williams@...il.com>,
Jeff Moyer <jmoyer@...hat.com>,
Simon Holm Thogersen <odie@...aau.dk>,
suresh.b.siddha@...el.com
Subject: Re: [PATCH 5/6] syslets: add generic syslets infrastructure
On Wednesday 09 January 2008 14:00:04 Zach Brown wrote:
> > Firstly, why not just specify an address for the return value and be
> > done with it? This infrastructure seems overkill, and you can always
> > extend later if required.
>
> Sorry, which infrastructure?
>
> Providing the function and stack to return to? Sure, I could certainly
> entertain the idea of not having syslet tasks return to userspace in the
> first pass. Ingo sure seemed excited by the idea.
>
> Or do you mean the syscall return value ending up in the userspace
> completion event ring? That's mostly about being able to wait for
> pending syslets to complete.
The latter. A ring is optimal for processing a huge number of requests, but
if you're really going to be firing off syslet threads all over the place
you're not going to be optimal anyway. And being able to point the return
value to the stack or into some datastructure is way nicer to code (zero
setup == easy to understand and easy to convert).
For notification, see below.
> > Secondly, you really should allow integration with an eventfd so you
> > don't make the posix AIO mistake of providing a poll-incompatible
> > interface.
>
> Yeah, this seems straight forward enough that I haven't made it an
> initial priority. I'm sure it will be helpful for people who are stuck
> integrating with entrenched software that wants to wait for pollable fds.
Unfortunately, waiting for someone to write a killer app which uses your new
API is the road to disappointment. The real target is convincing the handful
of important apps (Samba, Apache, ...) to #ifdef around some small piece of
code in order to get performance. And a mere single design wart could mean
that never happens. Look at epoll, it's probably been the most successful
and it's still damn niche.
> For more flexible software, though, it's compelling to now be able to
> aggregate waiting for completion of the existing waiting syscalls (poll,
> epoll_wait, futexes, whatever) by issuing them as concurrent syslets.
Is replacing epoll with syslets really going to win, even if you're writing
apps from scratch? Anyway a fast notification mechanism is a different
problem than syslets, and should be separated.
> > Finally, and probably most alarmingly, AFAICT randomly changing TID will
> > break all threaded programs, which means this won't be fitted into
> > existing code bases, making it YA niche Linux-only API 8(
>
> I wonder if there isn't an opportunity to add a clone() flag which
> juggles the association between TIDs and task_structs. I don't relish
> the idea of investigating the life cycles of task_struct references that
> derive from TIDs and seeing how those would race with a syslet blocking
> and cloning, but, well, maybe that's what needs to be done.
This must be solved, yet all avenues seem crawling with worms. Redirecting
find_task_by_pid() to find the original and converting all the places where
we return tids to userspace? Swapping tids when we clone? Duplicate tids,
with only the non-syslet one being returned from find_task_by_pid()?
> This all isn't my area of expertise, though, sadly. It would be swell
> if someone wanted to look into it before I'm forced to learn yet another
> weird corner of the kernel.
Let's just tell Ingo it's impossible to solve :)
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists