[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070918093007.223350@gmx.net>
Date: Tue, 18 Sep 2007 11:30:07 +0200
From: "Michael Kerrisk" <mtk-manpages@....net>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Lee.Schermerhorn@...com, torvalds@...ux-foundation.org,
vda.linux@...glemail.com, rdunlap@...otime.net, corbet@....net,
hch@....de, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org, geoff@...are.org.uk,
drepper@...hat.com, davidel@...ilserver.org,
David Härdeman <david@...deman.nu>
Subject: Re: RFC: A revised timerfd API
Hi Thomas,
> On Tue, 2007-09-18 at 09:30 +0200, Michael Kerrisk wrote:
> > ====> a) Add an argument (a multiplexing timerfd() system call)
> > Disadvantage:
> > Jon Corbet pointed out
> > (http://thread.gmane.org/gmane.linux.kernel/559193/focus=570709 )
> > that this interface was starting to look like a multiplexing syscall,
> > because there is no case where all of the arguments are used (see
> > the use-case descriptions in the earlier mail).
> >
> > I'm inclined to agree with Jon; therefore one of the remaining
> > solutions may be preferable
>
> I agree. It's ugly.
Fair enough. I mainly tried to do things that way to minimize
the change from the Davide's original interface.
> > ====> b) Create a timerfd interface analogous to POSIX timers
> >
> > Create an interface analogous to POSIX timers:
> >
> > fd = timerfd_create(clockid, flags);
> >
> > timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);
> >
> > timerfd_gettime(fd, &time_to_next_expire);
> >
> > Under this proposal, the manner of making a timer that does not
> > need "get-while-set" functionality remains fairly simple:
> >
> > fd = timerfd_create(clockid);
> >
> > timerfd_settime(fd, flags, newtimervalue, NULL);
> >
> > Advantage: this would be a clean, fully functional API, and well
> > understood by virtue of its analogy with the POSIX timers API.
> >
> > Disadvantage: 3 new system calls, rather than 1.
> >
> > This solution would be sufficient, IMO, but one of the
> > next solutions might be better.
>
> I'm not scared by the 3 system calls. I rather fear that we end up
> reimplementing half of the existing posix timer code.
Yes. Perhaps some refactoring might be required, if we went
down this route.
> > ====> c) Integrate timerfd with POSIX timers
> >
> > Make a very simple timerfd call that is integrated with the
> > POSIX timers API. The POSIX timers API is detailed here:
> > http://linux.die.net/man/3/timer_create
> > http://linux.die.net/man/3/timer_settime
> >
> > Under the POSIX timers API, a new timer is created using:
> >
> > int timer_create(clockid_t clockid, struct sigevent *evp,
> > timer_t *timerid);
> >
> > We could then have a timerfd() call that returns a file descriptor
> > for the newly created 'timerid':
> >
> > fd = timerfd(timer_t timerid);
> >
> > We could then use the POSIX timers API to operate on the timer
> > (start it / modify it / fetch timer value):
> >
> > int timer_settime(timer_t timerid, int flags,
> > const struct itimerspec *value,
> > struct itimerspec *ovalue);
> > int timer_gettime(timer_t timerid, struct itimerspec *value);
> >
> > And then read() from 'fd' as before.
> >
> > In the simple case (no "get" or "get-while-setting" functionality),
> > the use of API (c) would be:
> >
> > timer_create(clockid, &evp, &timerid);
> >
> > fd = timerfd(timerid);
> >
> > timer_settime(timerid, flags, &newvalue, NULL));
> >
> > Advantages:
> > 1. Integration with an existing API.
> > 2. Adds just a single system call.
> > 3. It _might_ be possible to construct an interface that allows
> > userland programs to do things like creating a timer fd for
> > a POSIX timer that was created via some library that doesn't
> > actually know about timer fds. (I can already see problems with
> > this, since that library will already expect to be delivering
> > timer notifications somehow (via threads or signals), and it may
> > be difficult to make the two notification mechanisms play
> > together in a sane way. But maybe someone else has a take on
> > this that can rescue this idea.)
> >
> > Disadvantages:
> > 1. Starts to get a little more clunky to use in the simple
> > case shown above.
> >
> > This strikes me as a more attractive solution than (b), if we can do
> > it properly -- that means: if we can achieve advantage 3
> > in some reasonable way. If we can't achieve that, then probably
> > the next solution is better.
>
> The main problem here is, that there is no way to tell the posix timer
> code that the delivery of the timer is through the file descriptor and
> not via the usual posix timer mechanisms. We need something like the
> SIGEV_TIMERFD flag to make the posix timer code aware of that.
Well, I left it it kind of open whether the expiration
notification might be delivered via both the traditional
mechanism, and via the tiemrfd. But I realize that all
may get overly complex.
> > ====> d) extend the POSIX timers API
> >
> > Under the POSIX timers API, the evp argument of timer_create() is a
> > structure that allows the caller to specify how timer expirations
> > should be notified. There are the following possibilities
> > (differentiated by the value assigned to evp.sigev_notify):
> >
> > i) notify via a signal: the caller specifies which signal the
> > kernel should deliver when the timer expires.
> > (SIGEV_SIGNAL)
> > ii) notify by delivering a signal to the thread whose thread ID
> > is specified in evp. (This is Linux specific.)
> > (SIGEV_THREAD_ID)
> > iii) notify via a thread: when the timer expires, the system starts
> > a new thread which receives an argument that was specified in
> > the evp structure. (SIGEV_THREAD)
> > iv) no notification: the caller can monitor the timer state using
> > timer_gettime(). (SIGEV_NONE)
> >
> > In all of the above cases, the return value from timer_create()
> > is 0 for success, or -1 for failure.
> >
> > We could extend the interface as follows:
> >
> > 1) Add a new flag for evp.sigev_notify: SIGEV_TIMERFD.
> > This flag indicates that the caller wants timer
> > notification via a file descriptor.
> > 2) Whenevp.sigev_notify == SIGEV_TIMERFD, have a successful
> > timer_create() call return a file descriptor (i.e., an
> > integer >= 0).
> >
> > Advantages:
> > 1. Integration with an existing API.
> > 2. No new system calls are required.
> > 3. This idea might even have a chance of getting standardized in
> > POSIX one day, since (IMO) it integrates fairly cleanly with
> > an existing API.
> >
> > Disadvantages:
> > 1. The fact that the return value of a successful timer_create()
> > is different for the SIGEV_TIMERFD case is a bit of a wart.
>
> What happens on close(fd) ? Is the posix timer automatically destroyed ?
I would say not (see also my reply to David Härdeman.)
> Is the file descriptor invalidated when the timer is destroyed via
> timer_delete(timer_id) ? The automatic file descriptor creation is a bit
> ugly.
Yes, it is a little ugly.
> I'd rather see a combination of c) and d) as a solution:
>
> Notify the posix timer code that the timer delivery is done via the file
> descriptor mechanism (SIGEV_TIMERFD).
>
> Use a new syscall to open a file descriptor on that timer.
>
> When the file descriptor is closed the timer is not destroyed, but
> delivery disabled (analogous to the SIGEV_NONE case), so you can reopen
> and reactivate it later on.
>
> This way we have it nicely integrated into the posix timer code and keep
> the existing semantics of posix timers intact.
>
> We need to think about the open file descriptor in the timer_delete()
> case as well, but this should be not too hard to sort out.
This seems like a workable idea also. But note David Härdeman's
critique of options c & d: the existence of a coupled timerfd
and a timerid means that the application must maintain a mapping
between the two, so that after an epoll call (for example) that
says the timerfd is ready, the timer can be manipulated using
the corresponding timerfd. This isn't IMO a fatal flaw, but
it does make the API a little more clumsy.
Cheers,
Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7
Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists