linux-kernel - Re: /proc/<pid>/status & task struct locking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160415182301.GA929@codemonkey.org.uk>
Date:	Fri, 15 Apr 2016 14:23:01 -0400
From:	Dave Jones <davej@...emonkey.org.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Linux Kernel <linux-kernel@...r.kernel.org>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: /proc/<pid>/status & task struct locking

On Fri, Apr 15, 2016 at 11:07:16AM -0700, Linus Torvalds wrote:
 > On Fri, Apr 15, 2016 at 9:49 AM, Dave Jones <davej@...emonkey.org.uk> wrote:
 > >  [<ffffffff811d7b39>] ? seq_vprintf+0x39/0x70
 > >  [<ffffffff811d7b35>] seq_vprintf+0x35/0x70
 > > Code: 89 cd 49 01 fc 0f 82 18 03 00 00 48 89 7d b0 41 0f b6 07 0f 1f 84 00 00 00 00 00 84 c0 74 43 48 8d 75 c8 4c 89 ff e8 30 d4 ff ff <0f> b6 55 c8 48 63 c8 4d 8d 34 0f 80 fa 07 0f 87 4c 02 00 00 ff
 > 
 > The code disassembles to
 > 
 >    0: 48 89 7d b0           mov    %rdi,-0x50(%rbp)
 >    4: 41 0f b6 07           movzbl (%r15),%eax
 >    8: 0f 1f 84 00 00 00 00 nopl
 >   10: 84 c0                 test   %al,%al
 >   12: 74 43                 je     0x57
 >   14: 48 8d 75 c8           lea    -0x38(%rbp),%rsi
 >   18: 4c 89 ff             mov    %r15,%rdi
 >   1b: e8 30 d4 ff ff       callq  0xffffffffffffd450
 >   20:* 0f b6 55 c8           movzbl -0x38(%rbp),%edx <-- trapping instruction
 >   24: 48 63 c8             movslq %eax,%rcx
 > 
 > which is interesting. That "-0x38(%rbp)" was passed (by reference) to
 > some subroutine, and now that we try to read the value, we take a
 > fault.
 > 
 > And it makes even less sense because %rbp really seems to be not a
 > random register, but the frame pointer:
 > 
 >   RBP: ffff8801ac52fc78
 >   RSP: ffff8801ac52fc08
 > 
 > So why the *hell* do we get
 > 
 >   BUG: unable to handle kernel NULL pointer dereference at 0000000000000019
 > 
 > for that? That makes no sense.

That's a really good question.

 > Quite frankly, I would not attribute this to /proc/pid/status with
 > this kind of insane oops.
 > 
 > Maybe I misread your oops, but that just all looks completely bogus.
 > Even if the stack got corrupted and/or unmapped, how did %cr2 get that
 > odd "0000000000000019" fault address? None of this makes any sense at
 > all to me.
 > 
 > What CPU is this on? There was the crazy AMD microcode bug. This looks
 > even more random, because now the registers look fine, and the oops
 > just looks bad.

It's a Xeon E5-2680 v4.

 > Do you have other versions of the oops for this same problem?

I've seen this a few times, so I'll see if I can dig up some more next week.

I'm now wondering if it's just some hardware bug though. As mentioned it's
an outlier bug, but one that seems to pop up enough times that it's been
nagging at me, in case it's also responsible for similar weird /proc traces
I've been seeing (more frequently), those have a different signature to this
though.

To put my mind at rest though, am I wrong about that absent task_lock() stuff ?

	Dave