[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080826130915.6fd85e34@doriath.conectiva>
Date: Tue, 26 Aug 2008 13:09:15 -0300
From: "Luiz Fernando N. Capitulino" <lcapitulino@...driva.com.br>
To: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc: Gerhard Brauer <gerhard.brauer@....de>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...e.hu>,
linux-kernel@...r.kernel.org
Subject: Re: 2.6.{26.2,27-rc} oops on virtualbox
Em Tue, 26 Aug 2008 10:53:38 -0400
Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca> escreveu:
| * Gerhard Brauer (gerhard.brauer@....de) wrote:
| > On Fri, Aug 22, 2008 at 02:08:13PM -0700, H. Peter Anvin wrote:
| > > Luiz Fernando N. Capitulino wrote:
| > >>
| > >> I have asked Mandriva and Ubuntu users to test this and all of
| > >> them so far are saying that noreplace-paravirt works.
| > >>
| > >> It makes the system slower, but it works.
| > >>
| > >
| > > Yes, the big issue is exactly what VirtualBox screws up in this matter,
| > > how to detect it, and how to work around it.
| > >
| > > It's pretty clear it's a VirtualBox f*ckup at this point, but the failure
| > > mechanism isn't at all obvious and so far the workaround is elusive.
| > >
| > > I'm strongly suspect this is a VirtualBox tcache management failure, but
| > > that doesn't help the situation without knowing how it happens.
| >
| > On Archlinux we have the same problem. We have a bugreport here:
| > http://bugs.archlinux.org/task/11141
| >
| > Myself test it with a LiveCD/Install-ISO which has 2.6.26 as install
| > kernel. We have the guest oops on virtualbox-ose, virtualbox-sun and both on
| > i686 or x86_64 hosts.
| >
| > Some things i noticed:
| > - The system boots always when i either enable VT-x in guest settings or
| > disable acpi and run the guest with acpi=off.
| > - The oops occurs always on (disk)-io, no matter which file system i
| > use.
| > - When the oops has occured and the guest has to close and restart then,
| > if i don't use VT-x or acpi=off, i always get an oops directly when
| > initrd/kernel is starting. Last screen message before the oops then is
| > "Freeing SMP alternatives".
| >
| > Here is also an archive with guest dmesg and messages.log from such an
| > oops when heavy disk io leads to the oops:
| > http://bugs.archlinux.org/task/11141?getfile=2445
| >
|
| Hrm, can you try this ?
|
| 1 - Make sure you kernel is not CONFIG_DEBUG_RODATA
"""
$ grep CONFIG_DEBUG_RODATA .config
# CONFIG_DEBUG_RODATA is not set
$
"""
| 2 - Change the whole text_poke implementation in
| arch/x86/kernel/alternative.c to this :
|
| void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
| {
| return text_poke_early(addr, opcode, len);
| }
|
| If this works, I suspect that the problem comes from a vmap/vunmap
| problem. If it still fails, the problem would likely come from a race
| with interrupt disabling probably due to missing data/instruction cache
| flush.
I still get the oops with this change. :((
| Then, after having tested (2), try this on top of it :
|
| In arch/x86/kernel/alternative.c, alternatives_smp_switch()
|
| Add unsigned long flags;
| Change
| spin_lock -> spin_lock_irqsave(&smp_alt, flags);
| spin_unlock(&smp_alt); -> spin_unlock_irqrestore(&smp_alt, flags);
|
| This will help testing if there is a problem with interrupts coming
| shortly after the modification. If it fixes the problem, my guess is
| that we should flush the instruction cache (and maybe the data cache ?)
| in text_poke and text_poke early when interrupts are off.
By 'on top of it' you mean I should make these changes with the
text_poke() version above right?
By the way, I have added a comment in the virtualbox's bugzilla
pointing out this thread but no feedback from them so far.
--
Luiz Fernando N. Capitulino
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists