lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0809151849490.14420@vixen.sonytel.be>
Date:	Mon, 15 Sep 2008 18:57:09 +0200 (CEST)
From:	Geert Uytterhoeven <Geert.Uytterhoeven@...ycom.com>
To:	Linux Kernel Development <linux-kernel@...r.kernel.org>
cc:	Cell Broadband Engine OSS Development <cbe-oss-dev@...abs.org>,
	Bob McGwier <rwmcgwier@...il.com>
Subject: Re: [Cbe-oss-dev] ps3: BUG: spinlock lockup on CPU#1, udevd/505 or
 modprobe/633

On Fri, 12 Sep 2008, Geert Uytterhoeven wrote:
> >From time to time, I get a lockup in udevd or modprobe when booting a recent
> kernel on a PS3. Anyone with an idea what's going on, or how to investigate
> this further?

Some more results of my investigation...

> <7>usb-storage: device scan complete
> <7>eth0: no IPv6 routers present
> <0>BUG: spinlock lockup on CPU#1, udevd/505, c00000000062d880
> Call Trace:
> [c000000006b6f4a0] [c00000000000f334] .show_stack+0x70/0x184 (unreliable)
> [c000000006b6f550] [c0000000001fa6c8] ._raw_spin_lock+0x138/0x174
> [c000000006b6f5f0] [c00000000039b87c] .lock_kernel+0x38/0x5c

The common part of all logs is that some code calls lock_kernel() to acquire
the BKL....

> [c000000006b6f670] [c00000000011e084] .proc_lookup_de+0x34/0x144
> [c000000006b6f710] [c000000000118064] .proc_root_lookup+0x28/0x70
> [c000000006b6f7b0] [c0000000000c93ac] .do_lookup+0x110/0x260
> [c000000006b6f870] [c0000000000cbd40] .__link_path_walk+0xa34/0xf70
> [c000000006b6f960] [c0000000000cc2d4] .path_walk+0x58/0xc4
> [c000000006b6fa10] [c0000000000cc4d0] .do_path_lookup+0x130/0x174
> [c000000006b6fac0] [c0000000000cd284] .__path_lookup_intent_open+0x70/0xd0
> [c000000006b6fb70] [c0000000000cd548] .do_filp_open+0xc4/0x7ec
> [c000000006b6fd00] [c0000000000bd044] .do_sys_open+0x88/0x158
> [c000000006b6fdb0] [c0000000000fe0f0] .compat_sys_open+0x24/0x38
> [c000000006b6fe30] [c0000000000074dc] syscall_exit+0x0/0x40
> <3>BUG: soft lockup - CPU#0 stuck for 61s! [vol_id:661]
> Modules linked in: joydev evdev
> irq event stamp: 141818
> hardirqs last  enabled at (141817): [<c000000000007a1c>] restore+0x1c/0xe4
> hardirqs last disabled at (141818): [<c000000000003600>] decrementer_common+0x100/0x180
> softirqs last  enabled at (141814): [<c0000000000202bc>] .call_do_softirq+0x14/0x24
> softirqs last disabled at (141809): [<c0000000000202bc>] .call_do_softirq+0x14/0x24
> NIP: c000000000082270 LR: c0000000000825d8 CTR: 0000000000324f98
> REGS: c00000001cbe7410 TRAP: 0901   Not tainted  (2.6.27-rc6-00036-ga551b98-dirty)
> MSR: 8000000000008032 <EE,IR,DR>  CR: 42004422  XER: 20000000
> TASK = c00000001cbe1440[661] 'vol_id' THREAD: c00000001cbe4000 CPU: 0
> <6>GPR00: 0000000000000001 c00000001cbe7690 c000000000602c20 c00000001cbe7700 
> <6>GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> <6>GPR08: 0000000000000000 0000000000000001 0000000000000000 c00000000003add8 
> <6>GPR12: 8000000000008032 c000000000635300 
> NIP [c000000000082270] .csd_flag_wait+0x4/0x1c
> LR [c0000000000825d8] .smp_call_function_single+0x13c/0x164
> Call Trace:
> [c00000001cbe7690] [c0000000000825d8] .smp_call_function_single+0x13c/0x164 (unreliable)
> [c00000001cbe7750] [c0000000000826e4] .smp_call_function_mask+0xe4/0x240
> [c00000001cbe78b0] [c000000000056780] .on_each_cpu+0x24/0x94
> [c00000001cbe7950] [c0000000000ebc1c] .invalidate_bh_lrus+0x24/0x3c
> [c00000001cbe79d0] [c0000000000f2ca4] .kill_bdev+0x2c/0x58
> [c00000001cbe7a50] [c0000000000f36b4] .__blkdev_put+0x94/0x1d4

... while it's already taken by __blkdev_put().

> [c00000001cbe7af0] [c0000000000c0684] .__fput+0x114/0x1fc
> [c00000001cbe7ba0] [c0000000000bcf04] .filp_close+0xac/0xd4
> [c00000001cbe7c30] [c000000000052e24] .put_files_struct+0xb4/0x13c
> [c00000001cbe7cd0] [c0000000000549b4] .do_exit+0x244/0x7c0
> [c00000001cbe7da0] [c000000000054fc4] .do_group_exit+0x94/0xc8
> [c00000001cbe7e30] [c0000000000074dc] syscall_exit+0x0/0x40
> Instruction dump:
> 7d005828 7c00592d 40a2fff8 4c00012c 2f880000 3860fff0 409e000c 38600000 
> f88b0008 ebc1fff0 4e800020 7c0004ac <80030020> 780907e1 4d820020 7c210b78 

The BKL is never released by __blkdev_put(), as the IPI initiated by
smp_call_function_single() never seems to finish. So it keeps on spinning in
csd_flag_wait().

Adding debug code to the PS3-specific IPI message_pass() routine
ps3_smp_message_pass() seems to cause the problem to go away.

I have lockdep enabled (as most other locking debug stuff), but CONFIG_PREEMPT
is not set.

As Bob McGwier sees it on PS3, but not on the QS21 or Mercury Cell blades, it
looks like the problem is really PS3-specific.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@...ycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ