linux-kernel - Re: frequent lockups in 3.18rc4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <54923F1F.7040301@oracle.com>
Date:	Wed, 17 Dec 2014 21:42:39 -0500
From:	Sasha Levin <sasha.levin@...cle.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Jones <davej@...hat.com>, Chris Mason <clm@...com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Suresh Siddha <sbsiddha@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Peter Anvin <hpa@...ux.intel.com>
Subject: Re: frequent lockups in 3.18rc4

On 12/15/2014 06:46 PM, Linus Torvalds wrote:
> I cleaned up the patch a bit, split it up into two to clarify it, and
> have committed it to my tree. I'm not marking the patches for stable,
> because while I'm convinced it's a bug, I'm also not sure why even if
> it triggers it doesn't eventually recover when the IO completes. So
> I'd mark them for stable only if they are actually confirmed to fix
> anything in the wild, and after they've gotten some testing in
> general. The patches *look* straightforward, they remove more lines
> than they add, and I think the code is more understandable too, but
> maybe I just screwed up. Whatever. Some care is warranted, but this is
> the first time I feel like I actually fixed something that matched at
> least one of your lockup symptoms.
> 
> Anyway, it's there as
> 
>   26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling")
>   7fb08eca4527 ("x86: mm: move mmap_sem unlock from mm_fault_error() to caller")

I guess you did "just screwed up"...

I've started seeing this:

[  240.190061] BUG: unable to handle kernel paging request at 00007f341768b000
[  240.190061] IP: [<00007f341baf61fb>] 0x7f341baf61fb
[  240.190061] PGD 12b3e4067 PUD 12b3e5067 PMD 29a700067 PTE 0
[  240.190061] Oops: 0004 [#10] PREEMPT SMP
[  240.190061] Dumping ftrace buffer:
[  240.190061]    (ftrace buffer empty)
[  240.190061] Modules linked in:
[  240.190061] CPU: 6 PID: 9691 Comm: trinity-c619 Tainted: G      D        3.18.0-sasha-08443-g2b40f4a #1618
[  240.190061] task: ffff88012b346000 ti: ffff88012b3d4000 task.ti: ffff88012b3d4000
[  240.190061] RIP: 0033:[<00007f341baf61fb>]  [<00007f341baf61fb>] 0x7f341baf61fb
[  240.190061] RSP: 002b:00007fff39f045f8  EFLAGS: 00010206
[  240.190061] RAX: 00007fff39f04600 RBX: 0000000000000363 RCX: 0000000000000200
[  240.190061] RDX: 0000000000001000 RSI: 00007f341768b000 RDI: 00007fff39f04600
[  240.190061] RBP: 00007fff39f05640 R08: 00007f341bdf20a8 R09: 00007f341bdf2100
[  240.190061] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000000001000
[  240.190061] R13: 0000000000001000 R14: 0000000000362000 R15: 00007fff39f04600
[  240.190061] FS:  00007f341bffb700(0000) GS:ffff8802da400000(0000) knlGS:0000000000000000
[  240.190061] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  240.190061] CR2: 00007f341894801c CR3: 000000012b364000 CR4: 00000000000006a0
[  240.190061] DR0: ffffffff81000000 DR1: 0000000000000000 DR2: 0000000000000000
[  240.190061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000b0602
[  240.190061]
[  240.190061] RIP  [<00007f341baf61fb>] 0x7f341baf61fb
[  240.190061]  RSP <00007fff39f045f8>
[  240.190061] CR2: 00007f341768b000

Which was bisected down to:

	26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling")


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/