lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101027182608.GA1580@arch.trippelsdorf.de>
Date:	Wed, 27 Oct 2010 20:26:08 +0200
From:	markus@...ppelsdorf.de
To:	john stultz <johnstul@...ibm.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Borislav Petkov <bp@...64.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"hpa@...ux.intel.com" <hpa@...ux.intel.com>,
	Ingo Molnar <mingo@...e.hu>,
	Andreas Herrmann <andreas.herrmann3@....com>,
	heiko.carstens@...ibm.com, a.p.zijlstra@...llo.nl, avi@...hat.com,
	mtosatti@...hat.com
Subject: Re: [bisected] Clocksource tsc unstable git

On Wed, Oct 27, 2010 at 04:26:22PM +0200, Markus Trippelsdorf wrote:
> On Tue, Oct 26, 2010 at 12:18:56PM -0700, john stultz wrote:
> > On Tue, 2010-10-26 at 17:48 +0200, Thomas Gleixner wrote:
> > > On Tue, 26 Oct 2010, Markus Trippelsdorf wrote:
> > > > On Tue, Oct 26, 2010 at 03:18:43PM +0200, Borislav Petkov wrote:
> > > > > otherwise, I don't see anything strange in your dmesg. Unless tglx has a
> > > > > better idea, I'd ask you to bisect it. 5618 changesets shouldn't be that
> > > > > much but I don't know, if the issue appears every several hours it could
> > > > > still be tedious.
> > > > 
> > > > That would be a several week long process, as the issue appears ~ every
> > > > 2 days here on this machine running 24/7.
> > > 
> > > There is only a single commit in that area post 2.6.36:
> > > 
> > >       8af3c153baf95374eff20a37f00c59a295b52756 
> > > 
> > > But I have a hard time how this should make this happen. John ?
> > 
> > Yea, that one doesn't look connected to me.
> > 
> > There have been a few cases that I've seen where we can get false
> > positives for bad TSCs due to the watchdog clocksource having problems
> > (or the clocksource watchdog thread getting delayed for such a long time
> > the watchdog clocksource wraps and we then can't validly compare the two
> > - although this would be hard to trigger with non-rt kernels).
> 
> I think the remark above points in the right direction, because the
> mysterious slowness happened again today (and this time on a vanilla git
> kernel). But this time no "Clocksource tsc unstable" message appeared in
> the kernel log. My system recovered by itself and after 2 minutes of
> slowness it was usable again. (The symptoms were the same as desribed in
> my first mail: Second long delays when typing and switching from X to
> the console took many seconds.)
> 
> Is there anything I can do to pinpoint the cause of this slowdown should
> it happen again? (Maybe a perf timechart?)

During my search for a reliable testcase I found out that kvm guests also
showed random hangs (that didn't happen in v2.6.36) here. These hangs last a
few seconds each, during which the mouse is not movable at all.

So I ran git-bisect with this testcase and the result of the bisection is:

34f971f6f7988be4d014eec3e3526bee6d007ffa is the first bad commit
commit 34f971f6f7988be4d014eec3e3526bee6d007ffa
Author: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Date:   Wed Sep 22 13:53:15 2010 +0200

    sched: Create special class for stop/migrate work

    In order to separate the stop/migrate work thread from the SCHED_FIFO
    implementation, create a special class for it that is of higher priority than
    SCHED_FIFO itself.

    This currently solves a problem where cpu-hotplug consumes so much cpu-time
    that the SCHED_FIFO class gets throttled, but has the bandwidth replenishment
    timer pending on the now dead cpu.

    It is also required for when we add the planned deadline scheduling class above
    SCHED_FIFO, as the stop/migrate thread still needs to transcent those tasks.

    Tested-by: Heiko Carstens <heiko.carstens@...ibm.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
    LKML-Reference: <1285165776.2275.1022.camel@...top>
    Signed-off-by: Ingo Molnar <mingo@...e.hu>

Reverting the commit solves the kvm hang issue.
(If this issue is related to my original tsc problem is of course open for
debate, but I have a strong hunch it is.)
-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ