linux-kernel - Re: Processes hang in an unkillable state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <BANLkTinMAGmdEDviRwZ1-mxrAAe_WK5jvg@mail.gmail.com>
Date:	Tue, 12 Apr 2011 14:46:48 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Robert Święcki <robert@...ecki.net>
Cc:	Oleg Nesterov <oleg@...hat.com>,
	Américo Wang <xiyou.wangcong@...il.com>,
	linux-kernel@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
	Miklos Szeredi <mszeredi@...e.cz>
Subject: Re: Processes hang in an unkillable state

On Tue, Apr 12, 2011 at 1:56 PM, Robert Święcki <robert@...ecki.net> wrote:
>
> Ok, just to update you with what I'm currently doing:
>
> I'm testing now with 2.6.39-rc3 - according to
> http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.39-rc3
> it has vma_to_resize patch included
> (982134ba62618c2d69fbbbd166d0a11ee3b7e3d8) - I applied the latest
> Linus' patch for sys_mlock (the one patching memory.c and mlock.c),
> disabled the sys_madvise in the fuzzer, and now I got the following
> (full kdb dump attached)

Ok, that's different from the apparent livelock.

Except it once again is one of the BUG_ON's in vma_prio_tree_add() -
and again, your kgdb thing has corrupted the bug information.

Can you make a bug-report to the kgdb people? It's annoying as hell
that all the *critical* bug information that the kernel prints out
apparently gets totally lost when you attach with the debugger. It's
not an Oops, it should have that nice BUG: together with filename and
line number.

> <d>Pid: 18598, comm: iknowthis Not tainted 2.6.39-rc3 #1<c> Dell Inc.
>               Precision WorkStation 390    <c>/0GH911<c>
> <d>RIP: 0010:[<ffffffff8116c842>]  [<ffffffff8116c842>] vma_prio_tree_add+0xc2/0xd0

Code disassembly shows:

   0:	58                   	pop    %rax
   1:	48 89 7e 68          	mov    %rdi,0x68(%rsi)
   5:	c9                   	leaveq
   6:	c3                   	retq
   7:	66 90                	xchg   %ax,%ax
   9:	48 8b 56 50          	mov    0x50(%rsi),%rdx
   d:	48 8d 47 50          	lea    0x50(%rdi),%rax
  11:	48 89 42 08          	mov    %rax,0x8(%rdx)
  15:	48 89 57 50          	mov    %rdx,0x50(%rdi)
  19:	48 8d 56 50          	lea    0x50(%rsi),%rdx
  1d:	48 89 57 58          	mov    %rdx,0x58(%rdi)
  21:	48 89 46 50          	mov    %rax,0x50(%rsi)
  25:	c9                   	leaveq
  26:	c3                   	retq
  27:*	0f 0b                	ud2         <-- trapping instruction
  29:	eb fe                	jmp    0x29
  2b:*	0f 0b                	ud2         <-- trapping instruction
  2d:	eb fe                	jmp    0x2d
  2f:	eb 08                	jmp    0x39

and scripts/decodecode is wrong, it's the _second_ of the two ud2's
that traps, as shown by the Code: line.

But whether that is the first or the second in the source code, who
knows? Gcc may have re-ordered things completely, and kdb has thrown
away the information that the kernel should have printed out.

Anyway, it looks _very_ much exactly like the old mremap() issue. But
if you are running -rc3, then you already have commit 42933bac11e8 in
your tree, so maybe there is some other way to trigger a vm_pgoff
overflow.

You've lost Hugh's patch that did the vma dump instead of having the
BUG_ON(). Can you try that one? And once more, I think that if you had
CONFIG_OPTIMIZE_SIZE on, then I think gcc wouldn't re-order the basic
blocks, and the BUG_ON() info would be easier to track.

> Call Trace:
>  [<ffffffff8116c9a1>] vma_prio_tree_insert+0x41/0x60
>  [<ffffffff8117cb8c>] __vma_link_file+0x4c/0x90
>  [<ffffffff8117d568>] vma_adjust+0xe8/0x570
>  [<ffffffff8117db31>] __split_vma+0x141/0x280
>  [<ffffffff8117dc95>] split_vma+0x25/0x30
>  [<ffffffff8117c1a1>] mlock_fixup+0x171/0x1c0
>  [<ffffffff8117c529>] do_mlock+0xc9/0x100
>  [<ffffffff8117c6d7>] sys_mlock+0xe7/0x130
>  [<ffffffff82284e03>] ia32_do_call+0x13/0x13

Hmm. mlock() itself should not be causing any pgoff expansion.

I wonder if this is related to that whole stack expansion thing (you
clearly are hitting the stack vma judging by the other bug you found),
and we have a pgoff underflow when expanding the stack?

Attached patch for your enjoyment. COMPLETELY UNTESTED, as usual.

Guys, can you think of any other thing that might expand a mapping?
Rather than find them one-by-one as Robert plays with his fuzzer?

                      Linus

View attachment "patch.diff" of type "text/x-patch" (748 bytes)