linux-kernel - Re: [2.6.30-rc1] RCU detected CPU 1 stall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0904141801300.24679@blonde.anvils>
Date:	Tue, 14 Apr 2009 18:11:54 +0100 (BST)
From:	Hugh Dickins <hugh@...itas.com>
To:	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
cc:	viro@...IV.linux.org.uk, linux-kernel@...r.kernel.org,
	jmorris@...ei.org, akpm@...ux-foundation.org,
	paulmck@...ux.vnet.ibm.com
Subject: Re: [2.6.30-rc1] RCU detected CPU 1 stall

On Mon, 13 Apr 2009, Tetsuo Handa wrote:
> Paul E. McKenney wrote:
> > Is this reproducible?
> Not always, but it is reproducible.
> 
> Al Viro wrote:
> > I'd really love to see results of repeated alt-sysrq-p/alt-sysrq-l, just
> > to see where was it actually spinning.
> Below is sysrq message.
> Maybe something related to khelper's current->mm == NULL warning problem.

Maybe, up to a point, and I'll post separately on those warnings.

> Full log is at http://I-love.SAKURA.ne.jp/tmp/dmesg-2.6.30-rc1-200904130930.txt .
> 
> [   47.412519] SysRq : Show Regs
> [   47.413986] 
> [   47.414584] Pid: 3655, comm: khelper Tainted: G        W  (2.6.30-rc1 #1) VMware Virtual Platform
> [   47.415804] EIP: 0060:[<c0379c3d>] EFLAGS: 00000293 CPU: 0
> [   47.415804] EIP is at __get_user_4+0x11/0x17
> [   47.415804] EAX: f7150003 EBX: f7150000 ECX: 00000000 EDX: f6744000
> [   47.415804] ESI: 000007b8 EDI: 7fffffff EBP: f6744f20 ESP: f6744f10
> [   47.415804]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [   47.415804] CR0: 8005003b CR2: f7150000 CR3: 3599a000 CR4: 000006d0
> [   47.415804] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [   47.415804] DR6: ffff0ff0 DR7: 00000400
> [   47.415804] Call Trace:
> [   47.415804]  [<c0225f4e>] ? count+0x3e/0xb0
> [   47.415804]  [<c0228581>] do_execve+0x621/0x890
> [   47.415804]  [<c022bd8b>] ? getname+0x6b/0xa0
> [   47.415804]  [<c010237e>] sys_execve+0x5e/0xb0
> [   47.415804]  [<c0103d19>] syscall_call+0x7/0xb
> [   47.415804]  [<c010aee4>] ? kernel_execve+0x24/0x30
> [   47.415804]  [<c0172b6f>] ? ____call_usermodehelper+0xff/0x170
> [   47.415804]  [<c0172a70>] ? ____call_usermodehelper+0x0/0x170
> [   47.415804]  [<c0104707>] ? kernel_thread_helper+0x7/0x10

I'm now thinking that this really is hitting in count(), despite
the ? on that in the backtrace, and is entirely unrelated to the
recent check_unsafe_exec() changes.  Stuck in a loop scanning the
the kernelspace exec args without an mm.

But my compiler on your config gives quite different function
sizes: please would you post to the list or send me privately
the output of "objdump -trd fs/exec.o", so we can check that.

Thanks,
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/