lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTiks_8sjStgGnTGVj-3UemDqP4G8hZuUDhngZhij@mail.gmail.com>
Date:	Tue, 16 Nov 2010 19:54:56 -0500
From:	Andrew Lutomirski <luto@....edu>
To:	john stultz <johnstul@...ibm.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	pc@...ibm.com
Subject: Re: [PATCH] Improve clocksource unstable warning

On Tue, Nov 16, 2010 at 7:26 PM, john stultz <johnstul@...ibm.com> wrote:
> On Tue, 2010-11-16 at 19:05 -0500, Andrew Lutomirski wrote:
>> On Fri, Nov 12, 2010 at 7:58 PM, john stultz <johnstul@...ibm.com> wrote:
>> > On Sat, 2010-11-13 at 00:22 +0000, john stultz wrote:
>> >> On Fri, 2010-11-12 at 18:51 -0500, Andrew Lutomirski wrote:
>> >> > Also wrong if cs_elapsed is just slightly less than wd_wrapping_time
>> >> > but the wd clocksource runs enough faster that it wrapped.
>> >>
>> >> Ok. Good point, that's a problem. Hrmmmm. Too much math for Friday. :)
>> >
>> > I have a hard time leaving things alone. :)
>> >
>> > So this still has the issue of the u64%u64 won't work on 32bit systems,
>> > but I think once I rework the modulo bit the following should be what
>> > you were describing.
>> >
>> > It is ugly, so let me know if you have a cleaner way.
>> >
>>
>> I'm playing with this stuff now, and it looks like my (invariant,
>> constant, single-package i7) TSC has a max_idle_ns of just over 3
>> seconds.  I'm confused.
>
> Yea. I hit this wall the other day as well. So my patch is invalid
> because its assuming the TSC deltas will be large, but for any
> unreasonable delay, we'll actually end up with multiply overflows,
> causing the tsc ns interval to be invalid as well.
>
> I'm starting to think we should be pushing the watchdog check into the
> timekeeping accumulation loop (or have it hang off of the accumulation
> loop).
>
> 1) The clocksource cyc2ns conversion code is built with assumptions
> linked to how frequently we accumulate time via update_wall_time().
>
> 2) update_wall_time() happens in timer irq context, so we don't have to
> worry about being delayed. If an irq storm or something does actually
> cause the timer irq to be delayed, we have bigger issues.

That's why I hit this.  It would be nice if we didn't respond to irq
storms by calling stop_machine.

>
> The only trouble with this, is that if we actually push the max_idle_ns
> out to something like 10 seconds on the TSC, we could end up having the
> watchdog clocksource wrapping while we're in nohz idle.  So that could
> be ugly. Maybe if the current clocksource needs the watchdog
> observations, we should cap the max_idle_ns to the smaller of the
> current clocksource and the watchdog clocksource.
>

What would you think about implementing non-overflowing
clocksource_cyc2ns on architectures that can do it efficiently?  You'd
have to artificially limit the mask to 2^64 / (rate in GHz), rounded
down to a power of 2, but that shouldn't be a problem for any sensible
clocksource.

x86_64 can do it with one multiply, two shifts, an or, and a subtract
(to figure out the shifts).  It should take just a couple cycles
longer than the current code (or maybe the same amount of time,
depending on how good the CPU is at running the whole thing in
parallel).
x86_32 and similar architectures would need two multiplies and one add.
Architectures with only 32x32->32 multiply would need three
multiplies.  (They're already presumably doing two multiplies with the
current code, though.)

The benefit would be that sensible clocksources (TSC and 64-bit HPET)
would essentially never overflow and multicore systems could keep most
cores asleep for as long as they liked.

(There's yet another approach: keep the current clocksource_cyc2ns,
but add an exact version and only use it when waking up from a long
sleep.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ