linux-kernel - Re: Linux guest domain with two vnets bound to the same vswitch experiences hung in bootup (sun

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <505766fa0910120123l51b57b8aja30a5d0fd1225c0d@mail.gmail.com>
Date:	Mon, 12 Oct 2009 16:23:47 +0800
From:	hyl <heyongli@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	sparclinux@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Linux guest domain with two vnets bound to the same vswitch 
	experiences hung in bootup (sun_netraT5220)

2009/10/10 David Miller <davem@...emloft.net>:
> From: David Miller <davem@...emloft.net>
> Date: Fri, 09 Oct 2009 15:08:29 -0700 (PDT)
>
>> Thank you for this bug report and patch, I am looking at
>> it now.
>
> I'm trying to figure out how the deadlock can even occur,
> and I've failed so far, please help me :-)
>
> See, we always take the VIO and LDC locks in the same order
> (VIO then LDC) and always with interrupts disabled, so it is
> not possible to deadlock.
>
> The only way we could deadlock is if:
>
> 1) There is some path that takes the LDC lock before the VIO one.
>
> 2) There is some path that takes either lock with interrupts
>   enabled.
>
> And I cannot find any such case.
David
 Thank you, i try to figure out the path which lead to system hang. i got the
output log :(the same output run18350079 times and going forever ... )

ctl: 17, data:0 err:0, abr:0   runcc:18350079 CPUId:0, event: 0x00000004
qall Trace:
s[00000000004acb30] _handle_IRQ_event+0x50/0x120
C[00000000004acc70] handle_IRQ_event+0x70/0x120
 [00000000004af14c] handle_fasteoi_irq+0xcc/0x180
 [000000000042ee54] handler_irq+0x134/0x160
 [00000000004208b4] tl0_irq5+0x14/0x20
 [00000000004acbac] _handle_IRQ_event+0xcc/0x120
 [00000000004acc70] handle_IRQ_event+0x70/0x120
 [00000000004af14c] handle_fasteoi_irq+0xcc/0x180
 [000000000042ee54] handler_irq+0x134/0x160
 [00000000004208b4] tl0_irq5+0x14/0x20
 [00000000004acbac] _handle_IRQ_event+0xcc/0x120
 [00000000004acc70] handle_IRQ_event+0x70/0x120
 [00000000004af14c] handle_fasteoi_irq+0xcc/0x180
 [000000000042ee54] handler_irq+0x134/0x160
 [00000000004208b4] tl0_irq5+0x14/0x20
 [00000000007e7ffc] _spin_unlock_irqrestore+0x3c/0x60



>runcc:18350079 CPUId:0, event: 0x00000004
the runcc is the count  of  times ldx_rx been run. dump code:

static irqreturn_t ldc_rx(int irq, void *dev_id)
{
  ....
  atomic64_inc(&runcc);
  ....
  printk(KERN_INFO"runcc:%lld CPUId:%d, event: 0x%08x\n",
  atomic_read(&runcc), smp_processor_id(),event_mask);
   dump_stack();

}

look the console output, system seems hang on a live lock:
tl0_irq5 triggered just after the irq been re-enable in the handler
of irq5: the ldc_rx.

i have no idea about the t10_irq5, just guess that: the special
configuration lead to t10_irq5 been triggered continuously, and
the trigger condition can not been cleared.


Pauli He



>
> It might help if you run your test case with lockdep enabled.  It will
> find such deadlocks and report them precisely to the kernel logs.
>
> Thank you!
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/