linux-kernel - Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50FD8D07.5030908@linaro.org>
Date:	Mon, 21 Jan 2013 10:46:31 -0800
From:	John Stultz <john.stultz@...aro.org>
To:	Feng Tang <feng.tang@...el.com>
CC:	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...e.hu>,
	"H. Peter Anvin" <hpa@...ux.intel.com>, x86@...nel.org,
	Len Brown <lenb@...nel.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	linux-kernel@...r.kernel.org,
	Jason Gunthorpe <jgunthorpe@...idianresearch.com>
Subject: Re: [RFC PATCH 0/5] Add support for S3 non-stop TSC support.

On 01/20/2013 10:38 PM, Feng Tang wrote:
> Hi All,
>
> On some new Intel Atom processors (Penwell and Cloverview), there is
> a feature that the TSC won't stop S3, say the TSC value won't be
> reset to 0 after resume. This feature makes TSC a more reliable
> clocksource and could benefit the timekeeping code during system
> suspend/resume cycles.
>
> The enabling efforts include adding new flags for this feature,
> modifying clocksource.c and timekeeping.c to support and utilizing
> it.
>
> One remaining question is inside the timekeeping_resume(), we don't
> know if it is called by resuming from suspend(s2ram) or from
> hibernate(s2disk), as there is no easy way to check it currently.
> But it doesn't hurt as these Penwell/Cloverview platforms only have
> S3 state, and no S4.
>

Ooof. This is an interesting feature, but it does complicate things a bit.

So just a few high-level thoughts initially.

The clocksource code has to balance being able to make fine tuned 
adjustments with also being able to properly account for time when no 
timer interrupts occur. So by stretching the maximum time interval out, 
you end up hurting the adjustment granularity.

Also, since you still have a limited time value (40 minutes instead of 
10), you will still run into lost time issues if the system suspends for 
longer then that. I think its reasonable to expect we get timer 
interrupts at least every 10 minutes while the system is running, but 
that's maybe not a reasonable expectation in suspend (even if we push it 
out to 40 minutes).

Because of this, I think trying to integrate this feature into the 
clocksource code is the wrong approach.

What this feature really reminds me of, is our discussion with Jason, 
and how the 32k counter is used on some ARM platforms with 
read_persistent_clock(). While read_persistent_clock() was initially a 
sort of special RTC interface, which let us initialize time properly in 
early boot and manage tracking suspend/resume time (before interrupts 
are enabled). The ARM platforms with the 32k counter really only use it 
for suspend/resume tracking (since it doesn't give a valid time at 
boot), and instead initialize time some other way.  I always considered 
it an interesting and creative slight misuse of the interface, but now 
that there's a good example of other systems where this approach would 
be usable, I think we should probably formalize it some.

What I'd propose is that we break the read_persistent_clock() 
functionality up. So we need two interfaces:
1) An interface to access a time value we used to initialize the 
system's CLOCK_REALTIME time.
2) An interface to measure the length of suspend.

Interface #1 could be possibly just replaced with the RTCTOSYS 
functionality. Although the downside there is that for some time at 
bootup between the timekeeping_init() function running (prior to 
interrupts being enabled) and the RTC driver being available (after 
interrupts are enabled), where we'd have an incorrect system clock. So 
we may want to preserve something like the existing 
read_persistent_clock() interface, but as Jason suggested, we could push 
that access into the RTC driver itself.

Interface #2 could then be either RTC based, or countinuous counter 
based. Since we still want to do this measurement with interrupts off, 
we still would need that interrupt-free RTC method like 
read_persistent_clock() where supported (falling back to the RTC 
driver's suspend/resume handler to try to fix things up as best it can 
if that's not available).

There is still plenty of ugly details as to how interface #2 would work. 
Since it could return something as coarse as seconds, or it could 
provide nanosecond granularity, you probably want to return a timespec 
that we'd capture at suspend and resume, and calculate the delta of. 
However, in order to properly provide a timespec from a raw TSC counter, 
you need to be careful with the math to avoid overflows as TSC counter 
value grows (take a look at the sched_clock code). Also whatever 
function backs this would need to have the logic to know when to use the 
TSC counter vs falling back to the RTC in the case where we're actually 
able to go into S4.

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/