[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46EF7E82.8040503@gmx.net>
Date: Tue, 18 Sep 2007 09:30:10 +0200
From: Michael Kerrisk <mtk-manpages@....net>
To: Michael Kerrisk <mtk-manpages@....net>
CC: Davide Libenzi <davidel@...ilserver.org>,
Ulrich Drepper <drepper@...hat.com>, geoff@...are.org.uk,
lkml <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Christoph Hellwig <hch@....de>,
Jonathan Corbet <corbet@....net>,
Randy Dunlap <rdunlap@...otime.net>, vda.linux@...glemail.com,
Linus Torvalds <torvalds@...ux-foundation.org>,
Lee Schermerhorn <Lee.Schermerhorn@...com>
Subject: Re: RFC: A revised timerfd API
[Resend, because I got one email address wrong in the earlier send]
After my earlier mail (full thread here:
http://thread.gmane.org/gmane.linux.kernel/574430/focus=579368 )
it seems that some people agree we should give a bit more
thought to how a final timerfd interface should look.
Davide's original API and the limitations that I see in it,
are described in the message cited above. In that message,
I proposed three alternatives (and I'm going to add a fourth
now) that provide "get-while-set" and "non-destructive-get"
functionality. Before trying to implement anything I'd like to
get input on the various possible designs.
The four designs are:
a) A multiplexing timerfd() system call.
b) Creating three syscalls analogous to the POSIX timers API (i.e.,
timerfd_create/timerfd_settime/timerfd_gettime).
c) Creating a simplified timerfd() system call that is integrated
with the POSIX timers API.
d) Extending the POSIX timers API to support the timerfd concept.
My order of preference for the implementations is currently
something like (in descending order): d, b, c, a.
The details follow:
====> a) Add an argument (a multiplexing timerfd() system call)
In an earlier mail (http://thread.gmane.org/gmane.linux.kernel/559193 ).
I proposed adding a further argument to timerfd(): old_utmr, which could
be used to return the time remaining until expiry for an existing timer
The proposed semantics that would allow get and get-while-setting
functionality.
Advantages:
1. Provides the desired get and get-while-setting functionality.
2. It's a simple interface (a single system call).
Disadvantage:
Jon Corbet pointed out
(http://thread.gmane.org/gmane.linux.kernel/559193/focus=570709 )
that this interface was starting to look like a multiplexing syscall,
because there is no case where all of the arguments are used (see
the use-case descriptions in the earlier mail).
I'm inclined to agree with Jon; therefore one of the remaining
solutions may be preferable
====> b) Create a timerfd interface analogous to POSIX timers
Create an interface analogous to POSIX timers:
fd = timerfd_create(clockid, flags);
timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);
timerfd_gettime(fd, &time_to_next_expire);
Under this proposal, the manner of making a timer that does not
need "get-while-set" functionality remains fairly simple:
fd = timerfd_create(clockid);
timerfd_settime(fd, flags, newtimervalue, NULL);
Advantage: this would be a clean, fully functional API, and well
understood by virtue of its analogy with the POSIX timers API.
Disadvantage: 3 new system calls, rather than 1.
This solution would be sufficient, IMO, but one of the
next solutions might be better.
====> c) Integrate timerfd with POSIX timers
Make a very simple timerfd call that is integrated with the
POSIX timers API. The POSIX timers API is detailed here:
http://linux.die.net/man/3/timer_create
http://linux.die.net/man/3/timer_settime
Under the POSIX timers API, a new timer is created using:
int timer_create(clockid_t clockid, struct sigevent *evp,
timer_t *timerid);
We could then have a timerfd() call that returns a file descriptor
for the newly created 'timerid':
fd = timerfd(timer_t timerid);
We could then use the POSIX timers API to operate on the timer
(start it / modify it / fetch timer value):
int timer_settime(timer_t timerid, int flags,
const struct itimerspec *value,
struct itimerspec *ovalue);
int timer_gettime(timer_t timerid, struct itimerspec *value);
And then read() from 'fd' as before.
In the simple case (no "get" or "get-while-setting" functionality),
the use of API (c) would be:
timer_create(clockid, &evp, &timerid);
fd = timerfd(timerid);
timer_settime(timerid, flags, &newvalue, NULL));
Advantages:
1. Integration with an existing API.
2. Adds just a single system call.
3. It _might_ be possible to construct an interface that allows
userland programs to do things like creating a timer fd for
a POSIX timer that was created via some library that doesn't
actually know about timer fds. (I can already see problems with
this, since that library will already expect to be delivering
timer notifications somehow (via threads or signals), and it may
be difficult to make the two notification mechanisms play
together in a sane way. But maybe someone else has a take on
this that can rescue this idea.)
Disadvantages:
1. Starts to get a little more clunky to use in the simple
case shown above.
This strikes me as a more attractive solution than (b), if we can do
it properly -- that means: if we can achieve advantage 3
in some reasonable way. If we can't achieve that, then probably
the next solution is better.
====> d) extend the POSIX timers API
Under the POSIX timers API, the evp argument of timer_create() is a
structure that allows the caller to specify how timer expirations
should be notified. There are the following possibilities
(differentiated by the value assigned to evp.sigev_notify):
i) notify via a signal: the caller specifies which signal the
kernel should deliver when the timer expires.
(SIGEV_SIGNAL)
ii) notify by delivering a signal to the thread whose thread ID
is specified in evp. (This is Linux specific.)
(SIGEV_THREAD_ID)
iii) notify via a thread: when the timer expires, the system starts
a new thread which receives an argument that was specified in
the evp structure. (SIGEV_THREAD)
iv) no notification: the caller can monitor the timer state using
timer_gettime(). (SIGEV_NONE)
In all of the above cases, the return value from timer_create()
is 0 for success, or -1 for failure.
We could extend the interface as follows:
1) Add a new flag for evp.sigev_notify: SIGEV_TIMERFD.
This flag indicates that the caller wants timer
notification via a file descriptor.
2) Whenevp.sigev_notify == SIGEV_TIMERFD, have a successful
timer_create() call return a file descriptor (i.e., an
integer >= 0).
Advantages:
1. Integration with an existing API.
2. No new system calls are required.
3. This idea might even have a chance of getting standardized in
POSIX one day, since (IMO) it integrates fairly cleanly with
an existing API.
Disadvantages:
1. The fact that the return value of a successful timer_create()
is different for the SIGEV_TIMERFD case is a bit of a wart.
=====
That's it. I welcome people's thoughts on these designs.
Cheers,
Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists