lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 05 Sep 2008 15:18:15 -0700
From:	Alok Kataria <akataria@...are.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	Arjan van de Veen <arjan@...radead.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Dan Hecht <dhecht@...are.com>,
	Garrett Smith <garrett@...are.com>
Subject: Re: [RFC patch 0/4] TSC calibration improvements

On Thu, 2008-09-04 at 14:33 -0700, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> > On Thu, 4 Sep 2008, Ingo Molnar wrote:
> > >
> > > hm, unless i'm missing something i think here we still have a small
> > > window for an SMI or some virtualization delay to slip in and cause
> > > massive inaccuracy: if the delay happens _after_ the last
> > > pit_expect_msb() and _before_ the external get_cycles() call. Right?
> >
> > Yes. I had the extra pit_expect_msb() originally, but decided that
> > basically a single-instruction race for somethign that ran without any
> > MSI for 15ms was a bit pointless.
> 
> the race is wider than that i think: all it takes an SMI at the last PIO
> access, so the window should be 1 usec, against a 15000 usecs period.
> That's 1 out of 15,000 boxes coming up with totally incorrect
> calibration.
> 
> we also might have a very theoretical race of an SMI taking exactly 65
> msecs so that the whole PIT wraps around and fools the fastpath - the
> chance for that would be around 1:300 - assuming we only have to hit the
> right MSB with a ~200 usecs precision). That assumes equal distribution
> of SMI costs which they certainly dont have - most of them are much less
> than 60 msecs. So i dont think it's an issue in practice - on real hw.
> 
> But it's still a possibility unless i'm missing something. We could
> protect against that case by reading the IRQ0-pending bit and making
> sure it's not pending after we have done the closing TSC readout.
> 
> > But adding another pit_expect_msb() is certainly not wrong.
> 

Hi, 
I ran the current tree with these patches on my VM setup for both 32 &
64bit around 200 reboots each. 
The system entered the FAST calibration mode more often this time,
around 25% of time.
And i had an interesting case where in the frequency that was calibrated
was 1875Mhz compared to actual ~1866Mhz, leaving an error of 0.5%.

Now, looking at the code.
Even with this last pit_expect_msb check, i think there can be a case
when a error spanning 114usec can slip in the TSC calculation. 

This can happen if, 
in the pit_expect_msb (the one just before the second read_tsc),
we hit an SMI/virtualization event *after* doing the 50 iterations of
PIT read loop, this allows the pit_expect_msb to succeed when the SMI
returns.

If this SMI/Virtualization event spans across the next PIT MSB increment
interval leaving sufficient time (100us) for the last pit_expect_msb to
succeed.
We can have a error of 1MSB tick increment - time taken for the last
pit_expect_msb to succeed, in the read TSC value.

i.e. a error of (214us - 100us) in the 15msec period, i.e. error of
7600PPM ??

And, in order for the TSC clocksource to keep correct time (on systems
where the TSC clocksource is usable), the TSC frequency estimate must be
within 500 ppm of its true frequency, otherwise NTP will not be able to
correct it.

So, IMHO we should not use this algorithm. 

I don't know if increasing the count threshold will help too, since that
threshold value may fail for some system which perform better than our
assumption of "we take 2us to do the 2 PIT reads". Atleast in
virtualized environment I can make no such guarantees. 

Thanks,
Alok

> ok, i kept that bit.
> 
>         Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ