lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 1 Sep 2008 22:07:33 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	Larry Finger <Larry.Finger@...inger.net>,
	LKML <linux-kernel@...r.kernel.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Alok Kataria <akataria@...are.com>,
	Michael Buesch <mb@...sch.de>
Subject: Re: Regression in 2.6.27 caused by commit bfc0f59

On Mon, 1 Sep 2008, Linus Torvalds wrote:
> On Mon, 1 Sep 2008, Thomas Gleixner wrote:
> > 
> > Hmm. Haven't seen that before, but if confirms what I guessed from
> > your previous dmesg information. I wonder why you did not observe
> > strange behaviour with older kernel versions.
> 
> x86-32 never used the PM_TIMER for frequency estimation, it only ever used 
> the PIT. See the old "native_calculate_cpu_khz()" in tsc_32.c that you 
> deleted in favor of the (imho inferior) x86-64 version.
> 
> How about:
> 
>  - taking the old 32-bit code, and using it to initially _just_ estimate 
>    the TSC speed. That code was stable and pretty much guaranteed to work 
>    reasonably well on all machines. It retries the timings three times, 
>    and picks the best one.
> 
>  - Then, _after_ you already have a pretty good estimation for TSC, you 
>    can use _that_ to then get the HPET and/or PM_TIMER version (and not 
>    use the PIT at all for those calibrations)
> 
>  - and if the PM_TIMER one is too far off, just throw it away. We know the 
>    PIT is a lot more trustworthy than the PM_TIMER.

Far off in which direction ?

If the PIT interrupts are delayed by SMM code, then I see That's on a
max. three years old 32bit Core Duo things like:

[    0.000000] Detected 8340.258 MHz processor.
[   13.782091] APIC calibration not consistent with PM Timer: 228ms instead of 100ms

This one is way off, while the next one is in a reasonable range

[    0.000000] Detected 3240.001 MHz processor.
[   13.792122] APIC calibration not consistent with PM Timer: 178ms instead of 100ms

while in reality the machine is @2GHZ and current mainline says:

[    0.000000] Detected 2000.065 MHz processor.

The CPU calibration of < 2.6.27 is against PIT and does _NOT_ give me a
pretty good estimation for TSC.

I was pretty happy when Alok beat me to unify the TSC calibration code
as it solved one of my long standing todo items, which also filled my
buglist on a regular base.

I did debugged this thorougly using the tracer from preempt-rt to
check, what the box does during that time, and it definitely vanishes
for >100ms in a row in the black hole of the stupid BIOS.


So either way. Relying on PIT on newer machines is _BAD_, relying on
PM_TIMER on older machine is _BAD_ as well.

There is no given good estimate, when the TSC/PIT calibration is off
by factor 1.5 to 4. The consequence would be that I throw away a
perfect fine pmtimer and run a machine which advertises itself as the
fastest box on the planet. With your method I would disable nohz and I
would be back to 50% battery time.

I'm happy to discard the PIT on the 32bit machines again and then file
a bugreport for a regression between 2.6.27-rc1 and tomorrows git :)

This one is the first complaints, I've seen vs. a non working pmtimer
since quite a time. That's why I obviously forgot about the rate check
issue.

I just looked at drivers/clocksource/acpi_pm.c history and saw, that
John explicitely mentions AMD K6 in commit
562f9c574e0707f9159a729ea41faf53b221cd30

    This patch re-adds the verify_pmtmr_rate functionality from 2.6.17 that
    I dropped 2.6.18.
    
    This resolves problems seen on older K6 ASUS boards where the ACPI PM
    timer runs too fast.

Larry's box has: "an AMD-K6 at stepping 0c and running at 450 MHz."

The oracle of google only gave me hits for AMD-K6 in a quick survey
along with the slow access mode problem for older ICH4 chipsets.

So I think it's a reasonable thing to disable the PMTIMER based
calibration on AMD-K6 and older. I'm not sure about the exact cut line
we choose - it might be wrong as always, but it's definitely better
than adding a lot of magic into the calibration code.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ