linux-kernel - Re: Regression in 2.6.27 caused by commit bfc0f59

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.10.0809020835150.3243@apollo.tec.linutronix.de>
Date:	Tue, 2 Sep 2008 14:15:43 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	Larry Finger <Larry.Finger@...inger.net>,
	LKML <linux-kernel@...r.kernel.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Alok Kataria <akataria@...are.com>,
	Michael Buesch <mb@...sch.de>
Subject: Re: Regression in 2.6.27 caused by commit bfc0f59

On Mon, 1 Sep 2008, Linus Torvalds wrote:
> > 
> > Well, the biggest problem is actually _detection_.
> > 
> > We have three different timers, and they all have their own problems. How 
> > do you reliably detect which one to use? The PM_TIMER clearly is _not_ 
> > always the answer here, but the code just assumes it is!
> 
> On the machine you have trouble with the PIT on, does this thing trigger?
> 
> If it does, that could be a simple way to say whether you prefer PM_TIMER 
> over PIT.
> 
> For me, even on a modern machine, I get a pit_count of 46321, which 
> matches the "about one microsecond for an ISA/LPC read" timing pretty 
> well. What do you get? 

About anything between 0 and useful, but still I had cases where the
pit_count was way above the 25000 but the TSC was off by factor 2.

Went down the road and instrumented the code:

   unsinged long tsc_deltas[50000];

   start_pit();
   tsc1 = tsc = read_tsc_start();
   while (!pit_ready()) {
   	 tsc2 = read_tsc();
 	 tsc_deltas[pit_count++] = tsc2 - tsc;
	 tsc = tsc2;
   }

Analysing the tsc_deltas gave interesting insight. On the affected
laptop I had several entries where the delta between two reads was
from 1msec up to 120msec maximum. 

As the code does nothing else and has interrupts disabled there is
only one explanation: the SMM/SMI black hole.

If this high delta hits after the pit_count reached 25000 we still
believe that our calibration against pit is fine :(

So what I'm working on is an algorithm, which is similar to the checks
in the tsc_read_refs() function. That should allow us to detect
whether one of the reads is way off by doing a min/max detection. In
such a case we can either repeat the calibration or try to figure out
whether the pmtimer / hpet can provide us with some useful reference.

Thanks,

	tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/