[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTiks_8sjStgGnTGVj-3UemDqP4G8hZuUDhngZhij@mail.gmail.com>
Date: Tue, 16 Nov 2010 19:54:56 -0500
From: Andrew Lutomirski <luto@....edu>
To: john stultz <johnstul@...ibm.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
pc@...ibm.com
Subject: Re: [PATCH] Improve clocksource unstable warning
On Tue, Nov 16, 2010 at 7:26 PM, john stultz <johnstul@...ibm.com> wrote:
> On Tue, 2010-11-16 at 19:05 -0500, Andrew Lutomirski wrote:
>> On Fri, Nov 12, 2010 at 7:58 PM, john stultz <johnstul@...ibm.com> wrote:
>> > On Sat, 2010-11-13 at 00:22 +0000, john stultz wrote:
>> >> On Fri, 2010-11-12 at 18:51 -0500, Andrew Lutomirski wrote:
>> >> > Also wrong if cs_elapsed is just slightly less than wd_wrapping_time
>> >> > but the wd clocksource runs enough faster that it wrapped.
>> >>
>> >> Ok. Good point, that's a problem. Hrmmmm. Too much math for Friday. :)
>> >
>> > I have a hard time leaving things alone. :)
>> >
>> > So this still has the issue of the u64%u64 won't work on 32bit systems,
>> > but I think once I rework the modulo bit the following should be what
>> > you were describing.
>> >
>> > It is ugly, so let me know if you have a cleaner way.
>> >
>>
>> I'm playing with this stuff now, and it looks like my (invariant,
>> constant, single-package i7) TSC has a max_idle_ns of just over 3
>> seconds. I'm confused.
>
> Yea. I hit this wall the other day as well. So my patch is invalid
> because its assuming the TSC deltas will be large, but for any
> unreasonable delay, we'll actually end up with multiply overflows,
> causing the tsc ns interval to be invalid as well.
>
> I'm starting to think we should be pushing the watchdog check into the
> timekeeping accumulation loop (or have it hang off of the accumulation
> loop).
>
> 1) The clocksource cyc2ns conversion code is built with assumptions
> linked to how frequently we accumulate time via update_wall_time().
>
> 2) update_wall_time() happens in timer irq context, so we don't have to
> worry about being delayed. If an irq storm or something does actually
> cause the timer irq to be delayed, we have bigger issues.
That's why I hit this. It would be nice if we didn't respond to irq
storms by calling stop_machine.
>
> The only trouble with this, is that if we actually push the max_idle_ns
> out to something like 10 seconds on the TSC, we could end up having the
> watchdog clocksource wrapping while we're in nohz idle. So that could
> be ugly. Maybe if the current clocksource needs the watchdog
> observations, we should cap the max_idle_ns to the smaller of the
> current clocksource and the watchdog clocksource.
>
What would you think about implementing non-overflowing
clocksource_cyc2ns on architectures that can do it efficiently? You'd
have to artificially limit the mask to 2^64 / (rate in GHz), rounded
down to a power of 2, but that shouldn't be a problem for any sensible
clocksource.
x86_64 can do it with one multiply, two shifts, an or, and a subtract
(to figure out the shifts). It should take just a couple cycles
longer than the current code (or maybe the same amount of time,
depending on how good the CPU is at running the whole thing in
parallel).
x86_32 and similar architectures would need two multiplies and one add.
Architectures with only 32x32->32 multiply would need three
multiplies. (They're already presumably doing two multiplies with the
current code, though.)
The benefit would be that sensible clocksources (TSC and 64-bit HPET)
would essentially never overflow and multicore systems could keep most
cores asleep for as long as they liked.
(There's yet another approach: keep the current clocksource_cyc2ns,
but add an exact version and only use it when waking up from a long
sleep.)
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists