linux-kernel - Re: Discussion: quick_pit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150610084703.3659.qmail@ns.horizon.com>
Date:	10 Jun 2015 04:47:03 -0400
From:	"George Spelvin" <linux@...izon.com>
To:	linux@...izon.com, mingo@...nel.org
Cc:	a.p.zijlstra@...llo.nl, adrian.hunter@...el.com,
	ak@...ux.intel.com, akpm@...ux-foundation.org, arjan@...radead.org,
	bp@...en8.de, hpa@...or.com, linux-kernel@...r.kernel.org,
	luto@...capital.net, penberg@....fi, tglx@...utronix.de,
	torvalds@...ux-foundation.org
Subject: Re: Discussion: quick_pit_calibrate is slow

Ingo Molnar wrote:
>* George Spelvin <linux@...izon.com> wrote:

> As a side note: so VMs often want to skip the whole calibration business,
> because they are running on a well-calibrated host.

> 1,000 msecs is also an eternity: consider for example the KVM + tools/kvm
> based "Clear Containers" feature from Arjan:
> ... which boots up a generic Linux kernel to generic Linux user-space in 32 
> milliseconds, i.e. it boots in 0.03 seconds (!).

Agreed, if you're paravirtualized, you can just pass this stuff in from
the host.  But there's plenty of hardware virtualization that boots
a generic Linux.

I pulled generous numbers out of my ass because I didn't want to over-reach
in the argument that it's taking too long.  The shorter the boot
time, the stronger the point.

>> With a total of 0.84 us of read uncertaity (1/12 of quick_pit_calibrate
>> currently), we can get within 500 ppm within 1.75 us.  Or do better
>> within 5 or 10.

> (msec you mean I suspect?)

Yes, typo; that should be 1.75 ms.

>> The loop I'd write would start the PIC (and the RTC, if we want to)
>> and then go round-robin reading all the time sources and associated
>> TSC values.

> I'd just start with the PIT to have as few balls in flight as possible.

Once I get the loop structured properly, additional timers really
aren't a problem.  The biggest PITA is the PM_TMR and all its
brokenness (do I have a PIIX machine in the closet somewhere?),
but the quick_pit_calibrate patch I already posted to LKML shows
how to handle that.  I set up a small circular buffer of captured
values, and when I'm (say) three captures past the "interesting"
one, go back and see if the reads look good.

> Could you please structure it the following way:
>
> - first a patch that fixes bogus comments about the current code. It has 
>   bitrotten and if we change it significantly we better have a well
>   documented starting point that is easier to compare against.
>
> - then a patch that introduces your more accurate calibration method and
>   uses it as the first method to calibrate. If it fails (and it should have a 
>   notion of failing) then it should fall back to the other two methods.
>
> - possibly add a boot option to skip your new calibration method -
>   i.e. to make the kernel behave in the old way. This would be useful
>   for tracking down any regressions in this.
>
>  - then maybe add a patch for the RTC method, but as a .config driven opt-in 
>    initially.

Sonds good, but when do we get to the decruftification?  I'd prefer to
prepare the final patch (if nothing else, so Linus will be reassured by
the diffstat), although I can see holding it back for a few releases.

> Please also add calibration tracing code (.config driven and default-off),
> so that the statistical properties of calibration can be debugged and
> validated without patching the kernel.

Definitely desired, but I have to be careful here.  Obviously I can't
print during the timing loop, so it will take either a lot of memory,
or add significant computation to the loop.

I also don't want to flood the kernel log before syslog is
started.

Do you have any specific suggestions?  Should I just capture everything
into a permanently-allocated buffer and export it via debugfs?

>> I realize this is a far bigger overhaul than Adrian proposed, but do other 
>> people agree that some decruftification is warranted?

> Absolutely!

Thanks for the encouragement!

>> Any suggestions for a reasonable time/quality tradeoff?  500 ppm ASAP?
>> Best I can do in 10 ms?  Wait until the PIT is 500 ppm and then use
>> the better result from a higher-resolution timer if available?

> So I'd suggest a minimum polling interval (at least 1 msecs?) plus a
> ppm target.  Would 100ppm be too aggressive?

How about 122 ppm (1/8192) because I'm lazy? :-)

What I imagine is this:

- The code will loop until it reaches 122 ppm or 55 ms, whichever comes
  first.  (There's also a minimum, before which 122 ppm isn't checked.)
- Initially, failure to reach 122 ppm will print a message and fall back.
- In the final cleanup patch, I'll accept anything up to 500 ppm
  and only fail (and disable TSC) if I can't reach that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/