lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4DE94EDD.8090707@free.fr>
Date:	Fri, 03 Jun 2011 23:15:09 +0200
From:	Laurent Riffard <laurent.riffard@...e.fr>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
CC:	netdev@...r.kernel.org
Subject: Re: [BUG] netconsole broken by scheduler updates

Le 26/05/2011 18:17, Peter Zijlstra a écrit :
> On Thu, 2011-05-26 at 18:12 +0200, Laurent Riffard wrote:
>> Le 26/05/2011 12:35, Laurent Riffard a écrit :
>>> On Thu, 26 May 2011, Peter Zijlstra wrote:
>>>> On Thu, 2011-05-26 at 11:21 +0200, Laurent Riffard wrote:
>>>>> Hi,
>>>>>
>>>>> Recently, netconsole was broken by some scheduler updates. Kernel hangs
>>>>> on boot near the network card initialization. I noticed that it does
>>>>> hang just where a "inconsistent lock state" message normally appears.
>>>>>
>>>>> I did a bisection : e4a52bcb9a18142d79e231b6733cabdbf2e67c1f is the
>>>>> first bad commit.
>>>>> commit e4a52bcb9a18142d79e231b6733cabdbf2e67c1f
>>>>> Author: Peter Zijlstra <a.p.zijlstra@...llo.nl>
>>>>> Date:   Tue Apr 5 17:23:54 2011 +0200
>>>>>
>>>>>     sched: Remove rq->lock from the first half of ttwu()
>>>>>
>>>>>
>>>>> Before this commit, kernel was booting succesfully despite the
>>>>> "inconsistent lock state" message. After this commit, the kernel does
>>>>> hang on boot, I have to push the reset button.
>>>>
>>>> Do you have CONFIG_HARDLOCKUP_DETECTOR=y, and do you get an NMI splat
>>>> after some 10 seconds? It would be interesting to see where its stuck.
>>>
>>> CONFIG_HARDLOCKUP_DETECTOR is not set for now. I'll give it a try.
>>
>> I activated CONFIG_HARDLOCKUP_DETECTOR, but failed to obtain more 
>> information. During the boot, the kernel hangs just after bringing up 
>> the NIC. I waited 2 minutes, without getting any more messages.
> 
> OK, thanks for trying, I shall try and reproduce locally.
> 

Hello,

I tried some newer kernel. It boots successfully now, but 
I've got interesting (?) logs :

====================================================
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.0.0-rc1-00049-g1fa7b6a (laurent@...imero) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) ) #148 SMP PREEMPT Thu Jun 2 18:56:41 CEST 2011
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.0.0-rc1-00049-g1fa7b6a root=/dev/mapper/vglinux1-lv_ubuntu_64bits ro debug netconsole=4444@....168.0.9/eth0,6666@....168.0.10/6C:62:6D:48:4B:C7 splash vt.handoff=7
[...]
[    0.685768] Console: switching to colour frame buffer device 160x64
[    0.707611] fb0: VESA VGA frame buffer device
[    0.714029] Linux agpgart interface v0.103
[    0.714206] forcedeth: Reverse Engineered nForce ethernet driver. Version 0.64.
[    0.714837] ACPI: PCI Interrupt Link [APCH] enabled at IRQ 23
[    0.715054] forcedeth 0000:00:0a.0: PCI INT A -> Link[APCH] -> GSI 23 (level, low) -> IRQ 23
[    0.715283] forcedeth 0000:00:0a.0: setting latency timer to 64
[    1.241552] forcedeth 0000:00:0a.0: ifname eth0, PHY OUI 0x732 @ 1, addr 00:1f:d0:53:49:ae
[    1.241784] forcedeth 0000:00:0a.0: highdma pwrctl mgmt gbit lnktim msi desc-v3
[    1.242101] netconsole: local port 4444
[    1.242207] netconsole: local IP 192.168.0.9
[    1.242320] netconsole: interface 'eth0'
[    1.242426] netconsole: remote port 6666
[    1.242531] netconsole: remote IP 192.168.0.10
[    1.242650] netconsole: remote ethernet address 6c:62:6d:48:4b:c7
[    1.242822] netconsole: device eth0 not up yet, forcing it
[    1.244612] forcedeth 0000:00:0a.0: irq 40 for MSI/MSI-X
[    1.245057] forcedeth 0000:00:0a.0: eth0: no link during initialization
[    2.374451] forcedeth 0000:00:0a.0: eth0: link up
[    2.381178]
[    2.381179] =================================
[    2.381182] [ INFO: inconsistent lock state ]
[    2.381184] 3.0.0-rc1-00049-g1fa7b6a #148
[    2.381186] ---------------------------------
[    2.381188] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[    2.381191] swapper/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
[    2.381193]  (&(&napi->poll_lock)->rlock){+.?...}, at: [<ffffffff8123cd08>] netpoll_poll_dev+0x9d/0x4ed
[    2.381202] {IN-SOFTIRQ-W} state was registered at:
[    2.381204]   [<ffffffff81068020>] __lock_acquire+0x319/0xdd8
[    2.381208]   [<ffffffff81068f9f>] lock_acquire+0xd8/0x103
[    2.381211]   [<ffffffff8129c019>] _raw_spin_lock+0x36/0x45
[    2.381215]   [<ffffffff8122b886>] net_rx_action+0x8d/0x1e7
[    2.381219]   [<ffffffff81040f73>] __do_softirq+0xe1/0x1c0
[    2.381223]   [<ffffffff8129e39c>] call_softirq+0x1c/0x30
[    2.381226]   [<ffffffff81004296>] do_softirq+0x46/0x9c
[    2.381229]   [<ffffffff810412c5>] irq_exit+0x4d/0xa2
[    2.381232]   [<ffffffff81003f3a>] do_IRQ+0x88/0x9f
[    2.381235]   [<ffffffff8129cb93>] ret_from_intr+0x0/0x1a
[    2.381239]   [<ffffffff8100972c>] default_idle+0x40/0x74
[    2.381242]   [<ffffffff81009844>] amd_e400_idle+0xe4/0xe6
[    2.381246]   [<ffffffff81001270>] cpu_idle+0x5e/0x94
[    2.381249]   [<ffffffff81296383>] start_secondary+0x18a/0x18f
[    2.381252] irq event stamp: 1000341
[    2.381254] hardirqs last  enabled at (1000341): [<ffffffff8129cc58>] restore_args+0x0/0x30
[    2.381257] hardirqs last disabled at (1000339): [<ffffffff8104100a>] __do_softirq+0x178/0x1c0
[    2.381261] softirqs last  enabled at (1000340): [<ffffffff8104103c>] __do_softirq+0x1aa/0x1c0
[    2.381264] softirqs last disabled at (1000251): [<ffffffff8129e39c>] call_softirq+0x1c/0x30
[    2.381267]
[    2.381268] other info that might help us debug this:
[    2.381269]  Possible unsafe locking scenario:
[    2.381270]
[    2.381271]        CPU0
[    2.381272]        ----
[    2.381273]   lock(&(&napi->poll_lock)->rlock);
[    2.381275]   <Interrupt>
[    2.381276]     lock(&(&napi->poll_lock)->rlock);
[    2.381279]
[    2.381279]  *** DEADLOCK ***
[    2.381280]
[    2.381282] 1 lock held by swapper/1:
[    2.381283]  #0:  (target_list_lock){+.+...}, at: [<ffffffff811e5877>] write_msg+0x34/0xcd
[    2.381288]
[    2.381289] stack backtrace:
[    2.381291] Pid: 1, comm: swapper Not tainted 3.0.0-rc1-00049-g1fa7b6a #148
[    2.381293] Call Trace:
[    2.381297]  [<ffffffff81067b1d>] valid_state+0x217/0x229
[    2.381300]  [<ffffffff81067396>] ? check_usage_forwards+0x85/0x85
[    2.381303]  [<ffffffff81067c11>] mark_lock+0xe2/0x1d8
[    2.381306]  [<ffffffff810693bb>] ? trace_hardirqs_on_caller+0x10b/0x12f
[    2.381309]  [<ffffffff8106809e>] __lock_acquire+0x397/0xdd8
[    2.381314]  [<ffffffff8106e882>] ? arch_local_irq_restore+0x6/0xd
[    2.381317]  [<ffffffff8123cd08>] ? netpoll_poll_dev+0x9d/0x4ed
[    2.381320]  [<ffffffff81068f9f>] lock_acquire+0xd8/0x103
[    2.381324]  [<ffffffff8123cd08>] ? netpoll_poll_dev+0x9d/0x4ed
[    2.381327]  [<ffffffff8129c557>] _raw_spin_trylock+0x4a/0x7f
[    2.381331]  [<ffffffff8123cd08>] ? netpoll_poll_dev+0x9d/0x4ed
[    2.381334]  [<ffffffff8123cd08>] netpoll_poll_dev+0x9d/0x4ed
[    2.381338]  [<ffffffff8123d2ee>] netpoll_send_skb_on_dev+0x10b/0x20f
[    2.381342]  [<ffffffff8123d5dd>] netpoll_send_udp+0x1eb/0x1fa
[    2.381345]  [<ffffffff811e5877>] ? write_msg+0x34/0xcd
[    2.381348]  [<ffffffff811e58cf>] write_msg+0x8c/0xcd
[    2.381352]  [<ffffffff8103b78d>] __call_console_drivers+0x72/0x84
[    2.381355]  [<ffffffff8103b7f8>] _call_console_drivers+0x59/0x5e
[    2.381359]  [<ffffffff8103c033>] console_unlock+0xf3/0x1ac
[    2.381362]  [<ffffffff8103c891>] register_console+0x228/0x2b3
[    2.381366]  [<ffffffff816a76b0>] init_netconsole+0x15d/0x1e4
[    2.381371]  [<ffffffff81166a64>] ? __pci_register_driver+0x9b/0xcd
[    2.381374]  [<ffffffff816a7553>] ? option_setup+0x1f/0x1f
[    2.381377]  [<ffffffff8100030f>] do_one_initcall+0x7a/0x137
[    2.381381]  [<ffffffff81685bcd>] kernel_init+0xb5/0x135
[    2.381386]  [<ffffffff81157d1e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[    2.381389]  [<ffffffff8129e2a4>] kernel_thread_helper+0x4/0x10
[    2.381392]  [<ffffffff8129cc58>] ? retint_restore_args+0x13/0x13
[    2.381396]  [<ffffffff81685b18>] ? start_kernel+0x360/0x360
[    2.381399]  [<ffffffff8129e2a0>] ? gs_change+0x13/0x13
[    2.381464] ------------[ cut here ]------------
[    2.381468] WARNING: at net/core/netpoll.c:346 netpoll_send_skb_on_dev+0x158/0x20f()
[    2.381470] Hardware name: M68SM-S2L
[    2.381473] netpoll_send_skb(): eth0 enabled interrupts in poll (nv_start_xmit_optimized+0x0/0x484)
[    2.381474] Modules linked in:
[    2.381477] Pid: 1, comm: swapper Not tainted 3.0.0-rc1-00049-g1fa7b6a #148
[    2.381479] Call Trace:
[    2.381482]  [<ffffffff8103b381>] warn_slowpath_common+0x80/0x98
[    2.381485]  [<ffffffff8103b42d>] warn_slowpath_fmt+0x41/0x43
[    2.381489]  [<ffffffff811e526c>] ? nv_start_xmit+0x3df/0x3df
[    2.381492]  [<ffffffff8123d33b>] netpoll_send_skb_on_dev+0x158/0x20f
[    2.381496]  [<ffffffff8123d5dd>] netpoll_send_udp+0x1eb/0x1fa
[    2.381499]  [<ffffffff811e5877>] ? write_msg+0x34/0xcd
[    2.381502]  [<ffffffff811e58cf>] write_msg+0x8c/0xcd
[    2.381505]  [<ffffffff8103b78d>] __call_console_drivers+0x72/0x84
[    2.381508]  [<ffffffff8103b7f8>] _call_console_drivers+0x59/0x5e
[    2.381512]  [<ffffffff8103c033>] console_unlock+0xf3/0x1ac
[    2.381515]  [<ffffffff8103c891>] register_console+0x228/0x2b3
[    2.381518]  [<ffffffff816a76b0>] init_netconsole+0x15d/0x1e4
[    2.381522]  [<ffffffff81166a64>] ? __pci_register_driver+0x9b/0xcd
[    2.381525]  [<ffffffff816a7553>] ? option_setup+0x1f/0x1f
[    2.381528]  [<ffffffff8100030f>] do_one_initcall+0x7a/0x137
[    2.381531]  [<ffffffff81685bcd>] kernel_init+0xb5/0x135
[    2.381535]  [<ffffffff81157d1e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[    2.381538]  [<ffffffff8129e2a4>] kernel_thread_helper+0x4/0x10
[    2.381541]  [<ffffffff8129cc58>] ? retint_restore_args+0x13/0x13
[    2.381545]  [<ffffffff81685b18>] ? start_kernel+0x360/0x360
[    2.381548]  [<ffffffff8129e2a0>] ? gs_change+0x13/0x13
[    2.381555] ---[ end trace f00f2127e552114f ]---
[    2.383979] console [netcon0] enabled
[    2.384117] netconsole: network logging started
[    2.384492] i8042: PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
[    2.384679] i8042: PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
[    2.385208] serio: i8042 KBD port at 0x60,0x64 irq 1
[...]
==================================================================


hope this help
~~
laurent

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ