lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1275004816.1813.15.camel@work-vm>
Date:	Thu, 27 May 2010 17:00:16 -0700
From:	john stultz <johnstul@...ibm.com>
To:	Bernhard Schmidt <berni@...kenwald.de>
Cc:	linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: Clock jumps

On Thu, 2010-05-27 at 23:48 +0200, Bernhard Schmidt wrote:
> On 27.05.2010 21:08, john stultz wrote:
> > I'd be very interested in hearing more about the host side issue. So
> > this happened with the same kernel that you were using before, with no
> > trouble?
> 
> Correct.
> 
> > Could you also send dmesg output from this boot? And if you can find
> > any older dmesg logs to compare with, send those too?
> 
> See http://users.birkenwald.de/~berni/temp/dmesg-lenny and 
> http://users.birkenwald.de/~berni/temp/dmesg-squeeze . Although running 
> on the same kernel binary the initrd changed greatly when upgrading, so 
> ordering/timing between those two is off.
> 
> Note that the dmesg output is captured right after boot. I think I 
> remember seeing a "TSC unstable" message pretty soon after boot, but I 
> might be mixing it up with my other AMD-based KVM server. I don't hold 
> normal (non-boot) logs that long, so I can't tell for sure.

Looking at the diff:
--- dmesg-lenny 2010-05-27 16:45:33.000000000 -0700
+++ dmesg-squeeze       2010-05-27 16:46:14.000000000 -0700
@@ -132,8 +132,8 @@
 console [ttyS1] enabled
 hpet clockevent registered
 Fast TSC calibration using PIT
-Detected 2660.398 MHz processor.
-Calibrating delay loop (skipped), value calculated using timer frequency.. 5320.79 BogoMIPS (lpj=10641592)
+Detected 2613.324 MHz processor.
+Calibrating delay loop (skipped), value calculated using timer frequency.. 5226.64 BogoMIPS (lpj=10453296)
 Security Framework initialized
 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
 Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
@@ -160,7 +160,7 @@
 CPU0: Intel(R) Xeon(R) CPU            3075  @ 2.66GHz stepping 0b
 Booting Node   0, Processors  #1
 Brought up 2 CPUs
-Total of 2 processors activated (10640.79 BogoMIPS).
+Total of 2 processors activated (10546.63 BogoMIPS).
 NET: Registered protocol family 16
 ACPI: bus type pci registered
 PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)

So you can see in the above the during the second boot the TSC
calibration was badly mis-calculated. This was the cause of the skew.

Not sure how that might be linked to the distro upgrade. It could have
been something like SMI damage during the calibration time, but I
thought the calibration loop watched for that.

Bernhard: I expect with all those vms, this machine isn't rebooted
frequently. So could you look at the logs to see how much the  "Detected
xxxx.yyy MHz processor." line varies by across a few other boots (if
they still exist?).

Ingo/Thomas: Any thoughts, should we be considering dropping the
quick_pit_calibrate() code and always do the slower route?

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ