lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 24 Jul 2006 08:58:58 -0700
From:	john stultz <johnstul@...ibm.com>
To:	Andrew Morton <akpm@...l.org>
Cc:	Matthias Urlichs <smurf@...rf.noris.de>,
	linux-kernel@...r.kernel.org, torvalds@...l.org, bunk@...sta.de,
	lethal@...ux-sh.org, hirofumi@...l.parknet.co.jp,
	Andi Kleen <ak@....de>
Subject: Re: REGRESSION: the new i386 timer code fails to sync CPUs

On Sun, 2006-07-23 at 05:37 -0700, Andrew Morton wrote:
> On Sun, 23 Jul 2006 14:08:29 +0200
> Matthias Urlichs <smurf@...rf.noris.de> wrote:
> 
> > Hi,
> > 
> > Andrew Morton:
> > > - CPU0 and CPU1 share a TSC and CPU2 and CPU3 share another TSC.
> > > 
> > That mmakes sense, since they're one dual-core Xeon each.
> 
> OK.
> 
> > > - Earlier kernels didn't use the TSC as a time source whereas this one
> > >   does, hence the problems which you're observing.
> > > 
> > Correct; see below.
> > 
> > > I assume that booting with clock=pit or clock=pmtmr fixes it?
> > > 
> > Testing... yes, both.
> > 
> > > It would be useful to check your 2.6.17 boot logs, see if we can work out
> > > what 2.6.17 was using for a clock source.
> > > 
> > That's easy:
> > 
> > 2.6.17    -Using pmtmr for high-res timesource
> > 2.6.18git +Time: tsc clocksource has been installed.
> > 
> > I missed those two lines, as in the boot logs they're not really
> > adjacent, so they got lost in the jumble of other differences.
> 
> OK, thanks.  Marking the TSC as bad in this case is simple to do - let us
> let John work out the best way.
> 
> We must have lost a TSC sanity check somewhere along the way.  I wonder
> what it was?

Well, I changed the TSC vs ACPI PM timer priority ordering to be more
like x86-64 (Andi had a similar patch he was proposing as well). For
awhile suse/redhat kernels have been swapping them, as the TSC gives
such a performance boost, however the ACPI PM timer is usually the safer
option (distro customers are often told to use clock=pmtmr on some
boxes).

I'll see what we can do to narrow it down, but its been assumed by both
x86-64 and the new i386 code that the TSCs on Intel SMP boxes are
synched, unless we're explicitly told they aren't (Summit, etc).

With the current code it is trivial to mark the TSC as unstable and the
system will automatically fall back to the next best clocksource. The
difficulty is just making sure we've got all the cases covered without
needlessly disqualifying synced systems.

Andi: If this is a generic issue, and not specific to Matthias' box, we
may need to re-think the assumption that Intel SMP is synced. You're
thoughts?

> > Interestingly, CPU0/1 gets 6000 bogomips while CPU2/3 only reaches 5600 ..?
> > (That happens with both kernels.) I do wonder why, and whether this has any
> > bearing on the current problem.
> 
> I wouldn't expect it to matter, unless the TSCs are running at different
> speeds or something.

Matthias: "clock=pmtmr" is probably the best workaround in the short
term. Could you send me your dmesg and dmidecode output? We'll try to
find something to key off of so it will mark the tsc as unstable by
default on your system.

thanks
-john


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ