[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070613204948.GA14710@dreamland.darkstar.lan>
Date: Wed, 13 Jun 2007 22:49:48 +0200
From: Luca Tettamanti <kronos.it@...il.com>
To: Avi Kivity <avi@...ranet.com>
Cc: kvm-devel@...ts.sf.net, linux-kernel@...r.kernel.org
Subject: Re: [kvm-devel] [BUG] Oops with KVM-27
Il Wed, Jun 13, 2007 at 11:59:25AM +0300, Avi Kivity ha scritto:
> Luca Tettamanti wrote:
> >Il Mon, Jun 11, 2007 at 10:44:45AM +0300, Avi Kivity ha scritto:
> >
> >>Luca wrote:
> >>
> >>>>I've managed to reproduce this on kvm-21 (it takes many boots for this
> >>>>to happen, but it does eventually).
> >>>>
> >>>Hum, any clue on the cause?
> >>>
> >>From what I've seen, it's the new Linux clocksource code.
> >>
> >>
> >>>Should I test older versions?
> >>>
> >>They're unlikely to be better. Instead, it would be best to see what
> >>the guest is doing.
> >>
> >
> >RCU is not working. Network initialization hangs because it happens to
> >be the first RCU user.
> >The guest is stuck waiting for RCU syncronization:
> >
> >[ 4.992207] [<c04321c5>] synchronize_rcu+0x4e/0x80
> >[ 4.994379] [<c0431db5>] wakeme_after_rcu+0x0/0x8
> >[ 4.996521] [<c0599ad1>] synchronize_net+0x64/0x8c
> >[ 4.998678] [<c05d70e0>] inet_register_protosw+0xef/0x151
> >[ 5.000984] [<c072d79e>] inet_init+0x1cd/0x498
> >
> >wait_for_completion() in synchronize_rcu() calls schedule() and the
> >completion is never signaled (wakeme_after_rcu is never called).
> >The completion AFAICS would be signaled via rcu_process_callbacks(),
> >which is called in tasklet context.
> >Scheduler and completion are working fine since they're used in other
> >part of the kernel without problems.
> >
> >To recap:
> >
> >i686 F7 kernel: always works.
> >
> >i586 F7 kernel: sometime hangs due to RCU problems. When it does work
> >it's because the LAPIC is disabled on boot:
> >
> >Using local APIC timer interrupts.
> >calibrating APIC timer ...
> >... lapic delta = 25745109
> >... PM timer delta = 0
> >..... delta 25745109
> >..... mult: 1105912110
> >..... calibration result: 4119217
> >..... CPU clock speed is 8794.0417 MHz.
> >..... host bus clock speed is 4119.0217 MHz.
> >... verify APIC timer
> >... jiffies delta = 103
> >APIC timer disabled due to verification failure.
> >
> >When it doesn't work LAPIC passes the test:
> >
> >[ 1.304717] Using local APIC timer interrupts.
> >[ 1.304719] calibrating APIC timer ...
> >[ 1.718823] ... lapic delta = 25251444
> >[ 1.720582] ... PM timer delta = 0
> >[ 1.722219] ..... delta 25251444
> >[ 1.723827] ..... mult: 1084706136
> >[ 1.725470] ..... calibration result: 4040231
> >[ 1.727374] ..... CPU clock speed is 8625.0780 MHz.
> >[ 1.729396] ..... host bus clock speed is 4040.0231 MHz.
> >[ 1.731540] ... verify APIC timer
> >[ 2.158342] ... jiffies delta = 102
> >[ 2.160035] ... jiffies result ok
> >
> >i586 F7 kernel, with 'nolapic': always works.
> >
>
> Can you check which .config option causes it (a special type of
> bisecting...)?
>
> This looks likely based on your findings:
>
> -CONFIG_X86_ALIGNMENT_16=y
> +CONFIG_X86_GOOD_APIC=y
> CONFIG_X86_INTEL_USERCOPY=y
> +CONFIG_X86_USE_PPRO_CHECKSUM=y
> +CONFIG_X86_TSC=y
>
> I expect it's not directly related to i586 vs i686.
And the winner is... CONFIG_X86_GOOD_APIC ;-)
I think that !GOOD_APIC is a workaround for "11AP erratum in Pentium
Processor Specification Update" aka read-before-write bug.
The config symbol is used in include/asm-i386/apic.h:
#ifdef CONFIG_X86_GOOD_APIC
# define FORCE_READ_AROUND_WRITE 0
# define apic_read_around(x)
# define apic_write_around(x,y) apic_write((x),(y))
#else
# define FORCE_READ_AROUND_WRITE 1
# define apic_read_around(x) apic_read(x)
# define apic_write_around(x,y) apic_write_atomic((x),(y))
#endif
With GOOD_APIC apic_read_around is a nop, while apic_write_around is a
normal write. With !GOOD_APIC apic_write_around writes to the APIC reg
using xchg. With !GOOD_APIC and this patch:
--- include/asm-i386/apic.h~ 2007-04-26 05:08:32.000000000 +0200
+++ include/asm-i386/apic.h 2007-06-13 22:35:00.000000000 +0200
@@ -56,7 +56,8 @@
static __inline fastcall void native_apic_write_atomic(unsigned long reg,
unsigned long v)
{
- xchg((volatile unsigned long *)(APIC_BASE+reg), v);
+// xchg((volatile unsigned long *)(APIC_BASE+reg), v);
+ *((volatile unsigned long *)(APIC_BASE+reg)) = v;
}
static __inline fastcall unsigned long native_apic_read(unsigned long reg)
The kernel boots fine.
Luca
--
"Sei l'unica donna della mia vita".
(Adamo)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists