lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110112075229.GZ24920@pengutronix.de>
Date:	Wed, 12 Jan 2011 08:52:29 +0100
From:	Uwe Kleine-König 
	<u.kleine-koenig@...gutronix.de>
To:	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Cc:	kernel@...gutronix.de, Nick Piggin <npiggin@...nel.dk>,
	Soren Sandmann <ssp@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arjan van de Ven <arjan@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>
Subject: Re: BUG: spinlock recursion (sys_chdir, user_path_at,
 do_path_lookup ...)

Hello,

On Tue, Jan 11, 2011 at 12:05:39PM +0100, Uwe Kleine-König wrote:
> when testing yesterday's Linus' master branch
> (a08948812b30653eb2c536ae613b635a989feb6f + some arch support including
> Trond's latest nfsfix[1]) I hit the following reproducibly:
> 
> [    5.580000] BUG: spinlock recursion on CPU#0, init/1
> [    5.580000]  lock: c7487e10, .magic: dead4ead, .owner: init/1, .owner_cpu: 0
> [    5.590000] Backtrace: 
> [    5.590000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
> [    5.600000]  r7:c7487e10 r6:c0321368 r5:c7487e10 r4:c7848000
> [    5.610000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b516c>] (spin_bug+0x90/0xa4)
> [    5.620000] [<c01b50dc>] (spin_bug+0x0/0xa4) from [<c01b52d4>] (do_raw_spin_lock+0x50/0x154)
> [    5.620000]  r6:c7487e10 r5:c7487e10 r4:00000000
> [    5.630000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [    5.640000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [    5.650000]  r5:c7843efc r4:c7487dc0
> [    5.650000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [    5.660000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [    5.670000]  r6:c7843efc r5:c7843efc r4:00000000
> [    5.680000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [    5.680000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [    5.690000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [    5.700000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [    5.710000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [    5.720000]  r5:be961ee4 r4:00063015
> [   11.720000] BUG: spinlock lockup on CPU#0, init/1, c7487e10
> [   11.730000] Backtrace: 
> [   11.730000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
> [   11.740000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000000
> [   11.740000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b539c>] (do_raw_spin_lock+0x118/0x154)
> [   11.750000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [   11.760000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [   11.770000]  r5:c7843efc r4:c7487dc0
> [   11.780000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [   11.790000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [   11.790000]  r6:c7843efc r5:c7843efc r4:00000000
> [   11.800000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [   11.810000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [   11.820000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [   11.820000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [   11.830000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [   11.840000]  r5:be961ee4 r4:00063015
> [   75.280000] BUG: soft lockup - CPU#0 stuck for 64s! [init:1]
> [   75.280000] Modules linked in:
> [   75.280000] irq event stamp: 113662
> [   75.280000] hardirqs last  enabled at (113662): [<c0285a7c>] _raw_spin_unlock_irqrestore+0x48/0x50
> [   75.280000] hardirqs last disabled at (113661): [<c0285398>] _raw_spin_lock_irqsave+0x30/0x64
> [   75.280000] softirqs last  enabled at (113509): [<c026447c>] rpc_wake_up_next+0x1b0/0x1c4
> [   75.280000] softirqs last disabled at (113507): [<c02854f0>] _raw_spin_lock_bh+0x20/0x58
> [   75.280000] 
> [   75.280000] Pid: 1, comm:                 init
> [   75.280000] CPU: 0    Not tainted  (2.6.37-04021-gb8b018c-dirty #41)
> [   75.280000] PC is at do_raw_spin_lock+0xac/0x154
> [   75.280000] LR is at do_raw_spin_lock+0xc0/0x154
> [   75.280000] pc : [<c01b5330>]    lr : [<c01b5344>]    psr: 20000013
> [   75.280000] sp : c7843dd0  ip : c7843cd4  fp : c7843e04
> [   75.280000] r10: 06bd0000  r9 : 00000000  r8 : 00000000
> [   75.280000] r7 : c7842000  r6 : c7487e10  r5 : 00000000  r4 : 03dd5aca
> [   75.280000] r3 : 00000000  r2 : 00000001  r1 : c0285a74  r0 : 00000001
> [   75.280000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> [   75.280000] Control: 0005317f  Table: 479a8000  DAC: 00000015
> [   75.280000] [<c00356c4>] (show_regs+0x0/0x54) from [<c0089dac>] (watchdog_timer_fn+0x13c/0x1a4)
> [   75.280000]  r4:c7842000
> [   75.280000] [<c0089c70>] (watchdog_timer_fn+0x0/0x1a4) from [<c006cb58>] (__run_hrtimer+0x114/0x1f0)
> [   75.280000] [<c006ca44>] (__run_hrtimer+0x0/0x1f0) from [<c006ced8>] (hrtimer_interrupt+0x154/0x338)
> [   75.280000] [<c006cd84>] (hrtimer_interrupt+0x0/0x338) from [<c003e36c>] (mxs_timer_interrupt+0x28/0x34)
> [   75.280000] [<c003e344>] (mxs_timer_interrupt+0x0/0x34) from [<c008a408>] (handle_IRQ_event+0x7c/0x1a8)
> [   75.280000] [<c008a38c>] (handle_IRQ_event+0x0/0x1a8) from [<c008c948>] (handle_level_irq+0xc8/0x148)
> [   75.280000] [<c008c880>] (handle_level_irq+0x0/0x148) from [<c002d320>] (asm_do_IRQ+0x80/0xa4)
> [   75.280000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000030
> [   75.280000] [<c002d2a0>] (asm_do_IRQ+0x0/0xa4) from [<c0033ab8>] (__irq_svc+0x38/0x80)
> [   75.280000] Exception stack(0xc7843d88 to 0xc7843dd0)
> [   75.280000] 3d80:                   00000001 c0285a74 00000001 00000000 03dd5aca 00000000
> [   75.280000] 3da0: c7487e10 c7842000 00000000 00000000 06bd0000 c7843e04 c7843cd4 c7843dd0
> [   75.280000] 3dc0: c01b5344 c01b5330 20000013 ffffffff
> [   75.280000]  r5:f5000000 r4:ffffffff
> [   75.280000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [   75.280000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [   75.280000]  r5:c7843efc r4:c7487dc0
> [   75.280000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [   75.280000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [   75.280000]  r6:c7843efc r5:c7843efc r4:00000000
> [   75.280000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [   75.280000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [   75.280000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [   75.280000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [   75.280000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [   75.280000]  r5:be961ee4 r4:00063015
> 
> I started to bisect, but already the first test case showed a different
> error (my getty dying every few seconds).
I bisected this one now, the first bad commit is

	9c0729d (x86: Eliminate bp argument from the stack tracing routines)

.  It made a x86 specific change to include/linux/stacktrace.h.

According to tglx the lockup above "is related to nicks scalability
stuff".  I havn't researched yet the offending commit.  Is that
necessary?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ