[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080826145338.GA8601@Krystal>
Date: Tue, 26 Aug 2008 10:53:38 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Gerhard Brauer <gerhard.brauer@....de>
Cc: "H. Peter Anvin" <hpa@...or.com>,
"Luiz Fernando N. Capitulino" <lcapitulino@...driva.com.br>,
Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: Re: 2.6.{26.2,27-rc} oops on virtualbox
* Gerhard Brauer (gerhard.brauer@....de) wrote:
> On Fri, Aug 22, 2008 at 02:08:13PM -0700, H. Peter Anvin wrote:
> > Luiz Fernando N. Capitulino wrote:
> >>
> >> I have asked Mandriva and Ubuntu users to test this and all of
> >> them so far are saying that noreplace-paravirt works.
> >>
> >> It makes the system slower, but it works.
> >>
> >
> > Yes, the big issue is exactly what VirtualBox screws up in this matter,
> > how to detect it, and how to work around it.
> >
> > It's pretty clear it's a VirtualBox f*ckup at this point, but the failure
> > mechanism isn't at all obvious and so far the workaround is elusive.
> >
> > I'm strongly suspect this is a VirtualBox tcache management failure, but
> > that doesn't help the situation without knowing how it happens.
>
> On Archlinux we have the same problem. We have a bugreport here:
> http://bugs.archlinux.org/task/11141
>
> Myself test it with a LiveCD/Install-ISO which has 2.6.26 as install
> kernel. We have the guest oops on virtualbox-ose, virtualbox-sun and both on
> i686 or x86_64 hosts.
>
> Some things i noticed:
> - The system boots always when i either enable VT-x in guest settings or
> disable acpi and run the guest with acpi=off.
> - The oops occurs always on (disk)-io, no matter which file system i
> use.
> - When the oops has occured and the guest has to close and restart then,
> if i don't use VT-x or acpi=off, i always get an oops directly when
> initrd/kernel is starting. Last screen message before the oops then is
> "Freeing SMP alternatives".
>
> Here is also an archive with guest dmesg and messages.log from such an
> oops when heavy disk io leads to the oops:
> http://bugs.archlinux.org/task/11141?getfile=2445
>
Hrm, can you try this ?
1 - Make sure you kernel is not CONFIG_DEBUG_RODATA
2 - Change the whole text_poke implementation in
arch/x86/kernel/alternative.c to this :
void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
{
return text_poke_early(addr, opcode, len);
}
If this works, I suspect that the problem comes from a vmap/vunmap
problem. If it still fails, the problem would likely come from a race
with interrupt disabling probably due to missing data/instruction cache
flush.
Then, after having tested (2), try this on top of it :
In arch/x86/kernel/alternative.c, alternatives_smp_switch()
Add unsigned long flags;
Change
spin_lock -> spin_lock_irqsave(&smp_alt, flags);
spin_unlock(&smp_alt); -> spin_unlock_irqrestore(&smp_alt, flags);
This will help testing if there is a problem with interrupts coming
shortly after the modification. If it fixes the problem, my guess is
that we should flush the instruction cache (and maybe the data cache ?)
in text_poke and text_poke early when interrupts are off.
Mathieu
>
> > -hpa
>
> Gerhard
>
> --
> Standards sind eine tolle Sache.
> Ich finde, jeder sollte einen haben.
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists