[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0808291009450.3300@nehalem.linux-foundation.org>
Date: Fri, 29 Aug 2008 10:26:02 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Alan Cox <alan@...rguk.ukuu.org.uk>
cc: Arjan van de Ven <arjan@...radead.org>,
linux-kernel@...r.kernel.org, mingo@...e.hu, tglx@...x.de
Subject: Re: [PATCH 4/5] select: make select() use schedule_hrtimeout()
On Fri, 29 Aug 2008, Alan Cox wrote:
>
> > "schedule_timeout()", there's a big difference between asking for two
> > ticks and asking for two seconds. The latter should probably try to round
> > to a nice timer tick basis for power reasons).
>
> I disagree - that is fixing the problem in the wrong place. The timer
> structure needs an accuracy field of some form that the existing timer
> functions initialise to 0.
I do agree that we could do that too, but you miss one big issue: even if
we were to add an accuracy field inside the kernel, there is no such field
in the user interfaces.
We just pass timevals (and sometimes timespecs) around, and no, they don't
have any way to specify accuracy.
Yeah, we could use the high bits in the usec/nsec words, but then older
kernels would basically do random things, so that would be a horrible
interface.
The other thing to do would be to just add totally new system calls with
totally new interfaces, but (a) nobody would use them anyway and (b) it's
simply not worth it.
So given that reality, and _if_ we want to support nice high-resolution
sleeping by select/poll, the only reasonable thing to do is to estimate
some kind of expected accuracy from the existing timeval/timespec.
And the only reasonable way to do that is to just look at the range. You
can probably do something fairly trivial with
/* Estimate expected accuracy in ns from a timeval */
unsigned long estimate_accuracy(struct timeval *tv)
{
/*
* Tens of ms if we're looking at seconds, even
* more for 10s+ sleeping
*/
if (tv->tv_sec) {
/* Tenths of seconds for long sleeps */
if (tv->tv_sec > 10)
return 100000000;
/*
* Tens of ms for second-granularity sleeps. This,
* btw, is the historical Linux 100Hz timer range.
*/
return 10000000;
}
/* Single msecs if we're looking at milliseconds */
if (tv->tv_usec > 1000)
return 1000000;
/* Aim for tenths of msecs otherwise */
return 100000;
}
and yes, it's just a heuristic, but it's probably not a horribly stupid
one or a very unreasonable one.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists