[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <505766fa0910120123l51b57b8aja30a5d0fd1225c0d@mail.gmail.com>
Date: Mon, 12 Oct 2009 16:23:47 +0800
From: hyl <heyongli@...il.com>
To: David Miller <davem@...emloft.net>
Cc: sparclinux@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Linux guest domain with two vnets bound to the same vswitch
experiences hung in bootup (sun_netraT5220)
2009/10/10 David Miller <davem@...emloft.net>:
> From: David Miller <davem@...emloft.net>
> Date: Fri, 09 Oct 2009 15:08:29 -0700 (PDT)
>
>> Thank you for this bug report and patch, I am looking at
>> it now.
>
> I'm trying to figure out how the deadlock can even occur,
> and I've failed so far, please help me :-)
>
> See, we always take the VIO and LDC locks in the same order
> (VIO then LDC) and always with interrupts disabled, so it is
> not possible to deadlock.
>
> The only way we could deadlock is if:
>
> 1) There is some path that takes the LDC lock before the VIO one.
>
> 2) There is some path that takes either lock with interrupts
> enabled.
>
> And I cannot find any such case.
David
Thank you, i try to figure out the path which lead to system hang. i got the
output log :(the same output run18350079 times and going forever ... )
ctl: 17, data:0 err:0, abr:0 runcc:18350079 CPUId:0, event: 0x00000004
qall Trace:
s[00000000004acb30] _handle_IRQ_event+0x50/0x120
C[00000000004acc70] handle_IRQ_event+0x70/0x120
[00000000004af14c] handle_fasteoi_irq+0xcc/0x180
[000000000042ee54] handler_irq+0x134/0x160
[00000000004208b4] tl0_irq5+0x14/0x20
[00000000004acbac] _handle_IRQ_event+0xcc/0x120
[00000000004acc70] handle_IRQ_event+0x70/0x120
[00000000004af14c] handle_fasteoi_irq+0xcc/0x180
[000000000042ee54] handler_irq+0x134/0x160
[00000000004208b4] tl0_irq5+0x14/0x20
[00000000004acbac] _handle_IRQ_event+0xcc/0x120
[00000000004acc70] handle_IRQ_event+0x70/0x120
[00000000004af14c] handle_fasteoi_irq+0xcc/0x180
[000000000042ee54] handler_irq+0x134/0x160
[00000000004208b4] tl0_irq5+0x14/0x20
[00000000007e7ffc] _spin_unlock_irqrestore+0x3c/0x60
>runcc:18350079 CPUId:0, event: 0x00000004
the runcc is the count of times ldx_rx been run. dump code:
static irqreturn_t ldc_rx(int irq, void *dev_id)
{
....
atomic64_inc(&runcc);
....
printk(KERN_INFO"runcc:%lld CPUId:%d, event: 0x%08x\n",
atomic_read(&runcc), smp_processor_id(),event_mask);
dump_stack();
}
look the console output, system seems hang on a live lock:
tl0_irq5 triggered just after the irq been re-enable in the handler
of irq5: the ldc_rx.
i have no idea about the t10_irq5, just guess that: the special
configuration lead to t10_irq5 been triggered continuously, and
the trigger condition can not been cleared.
Pauli He
>
> It might help if you run your test case with lockdep enabled. It will
> find such deadlocks and report them precisely to the kernel logs.
>
> Thank you!
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists