linux-kernel - 2.6.33rc4 RCU hang mm spin_lock deadlock(?) after running libvirtd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <4B4E1461.4010806@majjas.com>
Date:	Wed, 13 Jan 2010 13:43:45 -0500
From:	Michael Breuer <mbreuer@...jas.com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	linux-kernel@...r.kernel.org
Subject: 2.6.33rc4 RCU hang mm spin_lock deadlock(?) after running libvirtd -
 reproducible.

[Originally posted as: "Re: 2.6.33RC3 libvirtd ->sky2 & rcu oops (was 
Sky2 oops - Driver    tries to sync DMA memory it has not allocated)"]

On 1/11/2010 8:49 PM, Paul E. McKenney wrote:
> On Sun, Jan 10, 2010 at 03:10:03PM -0500, Michael Breuer wrote:
>    
>> On 1/9/2010 5:21 PM, Michael Breuer wrote:
>>      
>>> Hi,
>>>
>>> Attempting to move back to mainline after my recent 2.6.32 issues...
>>> Config is make oldconfig from working 2.6.32 config. Patch for af_packet.c
>>> (for skb issue found in 2.6.32) included. Attaching .config and NMI
>>> backtraces.
>>>
>>> System becomes unusable after bringing up the network:
>>>
>>> ...
> RCU stall warnings are usually due to an infinite loop somewhere in the
> kernel.  If you are running !CONFIG_PREEMPT, then any infinite loop not
> containing some call to schedule will get you a stall warning.  If you
> are running CONFIG_PREEMPT, then the infinite loop is in some section of
> code with preemption disabled (or irqs disabled).
>
> The stall-warning dump will normally finger one or more of the CPUs.
> Since you are getting repeated warnings, look at the stacks and see
> which of the most-recently-called functions stays the same in successive
> stack traces.  This information should help you finger the infinite (or
> longer than average) loop.
> ...
>    
I can now recreate this simply by "service start libvirtd" on an F12 
box. My earlier report that suggested this had something to do with the 
sky2 driver was incorrect. Interestingly, it's always CPU1 whenever I 
start libvirtd.
Attaching two of the traces (I've got about ten, but they're all pretty 
much the same). Looks pretty consistent - libvirtd in CPU1 is hung 
forking. Not sure why yet - perhaps someone who knows this better than I 
can jump in.
Summary of hang appears to be libvirtd forks - two threads show with 
same pid deadlocked on a spin_lock
> Then if looking at the stack traces doesn't locate the offending loop,
> bisection might help.
>    
It would, however it's going to be really difficult as I wasn't able to 
get this far with rc1 & rc2 :(
> 							Thanx, Paul
>
>    
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>      


View attachment "stall1" of type "text/plain" (34802 bytes)

View attachment "stall2" of type "text/plain" (35630 bytes)