linux-kernel - KVM guest lockup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081219174401.GA2730@squirrel.roonstrasse.net>
Date:	Fri, 19 Dec 2008 18:44:01 +0100
From:	Max Kellermann <max@...mpel.org>
To:	kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: KVM guest lockup

Hi,

I am running a KVM enabled AMD64 kernel, version 2.6.26 and 2.6.27
(Debian kernel).  The guest has been tested with 2.6.26 up to
2.6.28-rc8.

Whenever I put high load on the guest (git-svn is quite good at
triggering the problem), it becomes unresponsive after a few minutes.
After a while, a "soft lockup" message appears on serial console, with
a ridiculously large time span, machine still unresponsive.
Unfortunately, I havn't captured the exact message yet, because it
didn't show up today.

I tried to debug it with KGDB, but the guest doesn't respond to sysrq
anymore (via serial tty), so KGDB cannot be used.

By starting kvm with "-s", I was able to attach gdb to KVM.  The
following backtrace is with default options:

 #0  0xffffffff8023f3e0 in update_wall_time () at kernel/time/timekeeping.c:515
 #1  0xffffffff80232411 in do_timer (ticks=18446744071567201664) at kernel/timer.c:1125
 #2  0xffffffff802427a5 in tick_do_update_jiffies64 (now=<value optimized out>) at kernel/time/tick-sched.c:78
 #3  0xffffffff80242916 in tick_sched_timer (timer=0xffffffff80509c00) at kernel/time/tick-sched.c:642
 #4  0xffffffff8023d1db in __run_hrtimer (timer=0xffffffff80509c00) at kernel/hrtimer.c:1289
 #5  0xffffffff8023da09 in hrtimer_interrupt (dev=<value optimized out>) at kernel/hrtimer.c:1373
 #6  0xffffffff80219e4d in smp_apic_timer_interrupt (regs=<value optimized out>) at arch/x86/kernel/apic.c:792
 #7  0xffffffff8020bcf8 in apic_timer_interrupt () at arch/x86/kernel/entry_64.S:858
 #8  0xffffffff80477e60 in init_thread_union ()
 #9  0x0000000000000000 in ?? ()

The next one is with "noapic nolapic acpi=off":

 #0  0xffffffff8023f35a in update_wall_time () at kernel/time/timekeeping.c:501
 #1  0xffffffff80232411 in do_timer (ticks=18446744071567201664) at kernel/timer.c:1125
 #2  0xffffffff802427a5 in tick_do_update_jiffies64 (now=<value optimized out>) at kernel/time/tick-sched.c:78
 #3  0xffffffff80242916 in tick_sched_timer (timer=0xffffffff80509c00) at kernel/time/tick-sched.c:642
 #4  0xffffffff8023d1db in __run_hrtimer (timer=0xffffffff80509c00) at kernel/hrtimer.c:1289
 #5  0xffffffff8023da09 in hrtimer_interrupt (dev=<value optimized out>) at kernel/hrtimer.c:1373
 #6  0xffffffff8020e4df in timer_interrupt (irq=<value optimized out>, dev_id=0x3b9aca00) at arch/x86/kernel/time_64.c:56
 #7  0xffffffff80248089 in handle_IRQ_event (irq=0, action=0xffffffff80441050) at kernel/irq/handle.c:142
 #8  0xffffffff802492a7 in handle_level_irq (irq=0, desc=0xffffffff8044de00) at kernel/irq/chip.c:372
 #9  0xffffffff8020d993 in do_IRQ (regs=<value optimized out>) at include/linux/irq.h:282
 #10 0xffffffff8020b943 in common_interrupt () at arch/x86/kernel/entry_64.S:675
 #11 0xffff88001f09b920 in ?? ()
 #12 0x0000000000000000 in ?? ()

Both backtraces are from vanilla 2.6.28-rc8, .config attached.

Single-stepping and breakpoints do not seem to work, so my debugging
options are limited (any clue why?).  From continuous "cont" attempts,
I could see the kernel was actually running, i.e. not hanging in a
single loop.

Any hint what could be the cause of the problem?  If you need any more
information, I will be happy to provide it - I can trigger the problem
at any time, it's just that gdb has limited capabilities.

Basic hardware information: Athlon 64 X2, 6 GB memory, ATI RS690/SB600
chipset.

Max

View attachment ".config" of type "text/plain" (22369 bytes)