[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101027195816.GA11248@amt.cnet>
Date: Wed, 27 Oct 2010 17:58:16 -0200
From: Marcelo Tosatti <mtosatti@...hat.com>
To: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc: markus@...ppelsdorf.de, john stultz <johnstul@...ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...64.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"hpa@...ux.intel.com" <hpa@...ux.intel.com>,
Ingo Molnar <mingo@...e.hu>,
Andreas Herrmann <andreas.herrmann3@....com>,
heiko.carstens@...ibm.com, avi@...hat.com
Subject: Re: [bisected] Clocksource tsc unstable git
On Wed, Oct 27, 2010 at 09:36:55PM +0200, Peter Zijlstra wrote:
> On Wed, 2010-10-27 at 20:26 +0200, markus@...ppelsdorf.de wrote:
> >
> > 34f971f6f7988be4d014eec3e3526bee6d007ffa is the first bad commit
> > commit 34f971f6f7988be4d014eec3e3526bee6d007ffa
> > Author: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> > Date: Wed Sep 22 13:53:15 2010 +0200
> >
> > sched: Create special class for stop/migrate work
> >
> > In order to separate the stop/migrate work thread from the SCHED_FIFO
> > implementation, create a special class for it that is of higher priority than
> > SCHED_FIFO itself.
> >
> > This currently solves a problem where cpu-hotplug consumes so much cpu-time
> > that the SCHED_FIFO class gets throttled, but has the bandwidth replenishment
> > timer pending on the now dead cpu.
> >
> > It is also required for when we add the planned deadline scheduling class above
> > SCHED_FIFO, as the stop/migrate thread still needs to transcent those tasks.
> >
> > Tested-by: Heiko Carstens <heiko.carstens@...ibm.com>
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> > LKML-Reference: <1285165776.2275.1022.camel@...top>
> > Signed-off-by: Ingo Molnar <mingo@...e.hu>
> >
> > Reverting the commit solves the kvm hang issue.
> > (If this issue is related to my original tsc problem is of course open for
> > debate, but I have a strong hunch it is.)
>
> Too weird,.. what does the hang look like?
>
> Can you generate a sysrq-t dump? The thing I'm looking for is the
> migration/# thread being runnable but not being current.
>
> How can I reproduce this?
I can reproduce it reliably here (requires kvm-autotest setup, its
difficult to reproduce manually).
INFO: task qemu:1872 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
qemu D ffff88007f44f1c0 0 1872 1 0x00000084
ffff88006522dbb8 0000000000000086 ffff8800ffffffff ffff88006522dfd8
0000000000013580 ffff880078dd0000 ffff880078dd0380 ffff880078dd0378
ffff88006522c000 ffff88006522dfd8 0000000000013580 0000000000013580
Call Trace:
[<ffffffff8143c919>] ? _raw_spin_unlock_irq+0x12/0x3c
[<ffffffff8143b111>] schedule_timeout+0x27/0xc0
[<ffffffff8143fc50>] ? sub_preempt_count+0xe/0xab
[<ffffffff810414e0>] ? get_parent_ip+0x11/0x41
[<ffffffff810414e0>] ? get_parent_ip+0x11/0x41
[<ffffffff8143fcd9>] ? sub_preempt_count+0x97/0xab
[<ffffffff8143a6e8>] wait_for_common+0xa3/0x110
[<ffffffff81046f02>] ? default_wake_function+0x0/0x14
[<ffffffff81039e92>] ? synchronize_sched_expedited_cpu_stop+0x0/0x10
[<ffffffff8143a80d>] wait_for_completion+0x1d/0x1f
[<ffffffff81091d90>] __stop_cpus+0xf5/0x112
[<ffffffff81091ded>] try_stop_cpus+0x40/0x59
[<ffffffff81039e92>] ? synchronize_sched_expedited_cpu_stop+0x0/0x10
[<ffffffff81040f50>] synchronize_sched_expedited+0x7e/0x98
[<ffffffff81040ed2>] ? synchronize_sched_expedited+0x0/0x98
[<ffffffff8106b707>] __synchronize_srcu+0x31/0x70
[<ffffffff8106b75b>] synchronize_srcu_expedited+0x15/0x17
[<ffffffffa0206b5c>] kvm_vm_ioctl_get_dirty_log+0x104/0x19b [kvm]
[<ffffffff8120c3e4>] ? rb_insert_color+0x68/0xe5
[<ffffffffa01f6e0b>] kvm_vm_ioctl+0x239/0x353 [kvm]
[<ffffffff8143c8fc>] ? _raw_spin_unlock_irqrestore+0x37/0x42
[<ffffffff8106affc>] ? __hrtimer_start_range_ns+0x329/0x33b
[<ffffffff8143c6ae>] ? _raw_spin_lock_irqsave+0x2a/0x46
[<ffffffff81117563>] do_vfs_ioctl+0x46a/0x4b9
[<ffffffff8143c8fc>] ? _raw_spin_unlock_irqrestore+0x37/0x42
[<ffffffff81117608>] sys_ioctl+0x56/0x79
[<ffffffff8143d48a>] ? do_device_not_available+0xe/0x10
[<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
cpu#0, 2660.055 MHz
runnable tasks:
task PID tree-key switches prio exec-runtime
sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
R kworker/0:0 4 3424836.020475 183681301 120 3424836.020475 3423901.420588 2685682.072677
migration/0 6 124.991395 297339 0 124.991395 0.001706 0.000000
Full log attached.
View attachment "nehalem-dmesg.txt" of type "text/plain" (123800 bytes)
Powered by blists - more mailing lists