[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1212454798.8211.17.camel@nimitz.home.sr71.net>
Date: Mon, 02 Jun 2008 17:59:58 -0700
From: Dave Hansen <dave@...ux.vnet.ibm.com>
To: Avi Kivity <avi@...ranet.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Anthony N. Liguori [imap]" <aliguori@...ibm.com>,
kvm@...r.kernel.org
Subject: Re: kvm causing memory corruption? now 2.6.26-rc4
On Mon, 2008-06-02 at 15:30 -0700, Dave Hansen wrote:
> On Thu, 2008-03-27 at 16:59 +0200, Avi Kivity wrote:
> > Dave Hansen wrote:
> > > On Thu, 2008-03-27 at 12:10 +0200, Avi Kivity wrote:
> > >> btw, is this with >= 4GB RAM on the host?
> > >
> > > Well, are you asking whether I have PAE on or not? :)
> >
> > No, I'm asking whether there is a possibility of address truncation :)
> >
> > PAE by itself doesn't affect kvm much, as it always runs the guest in
> > pae mode.
> >
> > Can you try running with mem=2000M or something?
>
> I have a few more data points on this. Sorry for the massive delay from
> the last report -- I'm being a crappy bug reporter. But, this is on my
> one and only laptop which makes it a serious pain to diagnose. I also
> didn't have a hardware serial console on it before, which I do now.
> This is all on 2.6.26-rc4-01549-g1beee8d.
>
> Adding the mem= does not help at all. But, it is all a bit more
> diagnosable now than a month or two ago. I turned on all of the kernel
> debugging that I could get my grubby little hands on. It now oopses
> quite consistently when kvm runs instead of after. Here's a collection
> of oopses that I captured after setting up a serial line:
>
> http://sr71.net/~dave/kvm-oops1.txt
>
> After collecting all those, I turned on CONFIG_DEBUG_HIGHMEM and the
> oopses miraculously stopped. But, the guest hung (for at least 5
> minutes or so) during windows bootup, pegging my host CPU. Most of the
> CPU was going to klogd, so I checked dmesg.
>
> I was seeing messages like this
>
> [ 428.918108] kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9
>
> And quite a few of them, like 100,000/sec. That's why klogd was pegging
> the CPU. Any idea on a next debugging step?
I followed these steps, and can now boot a vm. But, causing the host
crashes is still a pretty bad bug. I would imagine turning ACPI back on
will let me reproduce if necessary.
http://kvm.qumranet.com/kvmwiki/Windows_ACPI_Workaround
-- Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists