lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0809021918450.3243@apollo.tec.linutronix.de>
Date:	Tue, 2 Sep 2008 20:14:22 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	Larry Finger <Larry.Finger@...inger.net>,
	LKML <linux-kernel@...r.kernel.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Alok Kataria <akataria@...are.com>,
	Michael Buesch <mb@...sch.de>
Subject: Re: Regression in 2.6.27 caused by commit bfc0f59

On Tue, 2 Sep 2008, Linus Torvalds wrote:
> > So what I'm working on is an algorithm, which is similar to the checks
> > in the tsc_read_refs() function. That should allow us to detect
> > whether one of the reads is way off by doing a min/max detection. In
> > such a case we can either repeat the calibration or try to figure out
> > whether the pmtimer / hpet can provide us with some useful reference.
> 
> I think the most trivial approach would be to
> 
>  - just keep track of the max TSC difference for each loop iteration.
> 
>  - if the max TSC is bigger than 1% of the total TSC, then something is 
>    already seriously wrong (either we had very few loops indeed, or some 
>    of them were very expensive)

I went for summing up the deltas and build an average at the
end. That's from a loop of 10 consecutive runs:

[    0.000000] TSC min 2160 max       3732 avg  3266 pitcnt 30614
[    0.000000] TSC min 2160 max    1036164 avg  3299 pitcnt 30310
[    0.000000] TSC min 2160 max    1032360 avg  3303 pitcnt 30277

[    0.000000] TSC min 2160 max  210453018 avg 69509 pitcnt 30260

Hit very late in the loop, as pitcnt is close to the others

[    0.000000] TSC min 2160 max       3708 avg  3265 pitcnt 30624
[    0.000000] TSC min 2160 max       3720 avg  3265 pitcnt 30622
[    0.000000] TSC min 2160 max    1062252 avg  3301 pitcnt 30287
[    0.000000] TSC min 2160 max       3756 avg  3267 pitcnt 30605
[    0.000000] TSC min 2160 max       3732 avg  3267 pitcnt 30605
[    0.000000] TSC min 2136 max     989292 avg  3297 pitcnt 30324
[    0.000000] TSC min 2136 max       3744 avg  3266 pitcnt 30612

[    0.000000] TSC min 2160 max   78042006 avg 78045 pitcnt  1001

This one hit early in the loop as pitcnt is pretty low.

The min value is pretty constant.

The max value for sane loops is in the range of 3708 - 3756, the
average is between 3266 and 3267.

For those which have a ~500us maximum the average is still in a sane
range. That seems to be a single glitch, which pushs the maximum, but
does not really influence the average result.

The outstanding one is the 100ms (210 453 018 ticks), where the average
is also off by factor 20. 

I think that information is enough to give us a pretty precice idea
when to discard the result. I'm currently looking at the hpet/pmtimer
values for comparison and I should have a patch for testing ready
later tonight.

>  - perhaps loop over the calibration, and make the TSC calibration loop 
>    increase the delay. Because even if there is a 120ms hickup, if we had 
>    used a longer calibration delay, we'd probably not have noticed (well, 
>    ok, 120ms is pretty damning and is probably just unfixable, but smaller 
>    hickups are probably harmless)

Increasing the delay is probably not a good idea as we just make the
window larger for the SMI to happen.

> Additionally doing a min/max comparison to see that the loop is very 
> _stable_ is of course also a way to validate things, but expecting _too_ 
> much stability may be wrong too. As mentioned, SMM events can happen for 
> other reasons than emulation.

Yeah, I know. One of the oddballs is the USB->PS2 keyboard emulator
which is active during early boot. We do the USB handoff definitely
after the TSC calibration. Found a box with similar (not that bad)
hickups which go away when I disable that in the BIOS settings. 

Can't say anything about the laptop in that regard, because the BIOS
does not offer me a switch for that :(

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ