[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081219174401.GA2730@squirrel.roonstrasse.net>
Date: Fri, 19 Dec 2008 18:44:01 +0100
From: Max Kellermann <max@...mpel.org>
To: kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: KVM guest lockup
Hi,
I am running a KVM enabled AMD64 kernel, version 2.6.26 and 2.6.27
(Debian kernel). The guest has been tested with 2.6.26 up to
2.6.28-rc8.
Whenever I put high load on the guest (git-svn is quite good at
triggering the problem), it becomes unresponsive after a few minutes.
After a while, a "soft lockup" message appears on serial console, with
a ridiculously large time span, machine still unresponsive.
Unfortunately, I havn't captured the exact message yet, because it
didn't show up today.
I tried to debug it with KGDB, but the guest doesn't respond to sysrq
anymore (via serial tty), so KGDB cannot be used.
By starting kvm with "-s", I was able to attach gdb to KVM. The
following backtrace is with default options:
#0 0xffffffff8023f3e0 in update_wall_time () at kernel/time/timekeeping.c:515
#1 0xffffffff80232411 in do_timer (ticks=18446744071567201664) at kernel/timer.c:1125
#2 0xffffffff802427a5 in tick_do_update_jiffies64 (now=<value optimized out>) at kernel/time/tick-sched.c:78
#3 0xffffffff80242916 in tick_sched_timer (timer=0xffffffff80509c00) at kernel/time/tick-sched.c:642
#4 0xffffffff8023d1db in __run_hrtimer (timer=0xffffffff80509c00) at kernel/hrtimer.c:1289
#5 0xffffffff8023da09 in hrtimer_interrupt (dev=<value optimized out>) at kernel/hrtimer.c:1373
#6 0xffffffff80219e4d in smp_apic_timer_interrupt (regs=<value optimized out>) at arch/x86/kernel/apic.c:792
#7 0xffffffff8020bcf8 in apic_timer_interrupt () at arch/x86/kernel/entry_64.S:858
#8 0xffffffff80477e60 in init_thread_union ()
#9 0x0000000000000000 in ?? ()
The next one is with "noapic nolapic acpi=off":
#0 0xffffffff8023f35a in update_wall_time () at kernel/time/timekeeping.c:501
#1 0xffffffff80232411 in do_timer (ticks=18446744071567201664) at kernel/timer.c:1125
#2 0xffffffff802427a5 in tick_do_update_jiffies64 (now=<value optimized out>) at kernel/time/tick-sched.c:78
#3 0xffffffff80242916 in tick_sched_timer (timer=0xffffffff80509c00) at kernel/time/tick-sched.c:642
#4 0xffffffff8023d1db in __run_hrtimer (timer=0xffffffff80509c00) at kernel/hrtimer.c:1289
#5 0xffffffff8023da09 in hrtimer_interrupt (dev=<value optimized out>) at kernel/hrtimer.c:1373
#6 0xffffffff8020e4df in timer_interrupt (irq=<value optimized out>, dev_id=0x3b9aca00) at arch/x86/kernel/time_64.c:56
#7 0xffffffff80248089 in handle_IRQ_event (irq=0, action=0xffffffff80441050) at kernel/irq/handle.c:142
#8 0xffffffff802492a7 in handle_level_irq (irq=0, desc=0xffffffff8044de00) at kernel/irq/chip.c:372
#9 0xffffffff8020d993 in do_IRQ (regs=<value optimized out>) at include/linux/irq.h:282
#10 0xffffffff8020b943 in common_interrupt () at arch/x86/kernel/entry_64.S:675
#11 0xffff88001f09b920 in ?? ()
#12 0x0000000000000000 in ?? ()
Both backtraces are from vanilla 2.6.28-rc8, .config attached.
Single-stepping and breakpoints do not seem to work, so my debugging
options are limited (any clue why?). From continuous "cont" attempts,
I could see the kernel was actually running, i.e. not hanging in a
single loop.
Any hint what could be the cause of the problem? If you need any more
information, I will be happy to provide it - I can trigger the problem
at any time, it's just that gdb has limited capabilities.
Basic hardware information: Athlon 64 X2, 6 GB memory, ATI RS690/SB600
chipset.
Max
View attachment ".config" of type "text/plain" (22369 bytes)
Powered by blists - more mailing lists