lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160415182301.GA929@codemonkey.org.uk>
Date:	Fri, 15 Apr 2016 14:23:01 -0400
From:	Dave Jones <davej@...emonkey.org.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Linux Kernel <linux-kernel@...r.kernel.org>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: /proc/<pid>/status & task struct locking

On Fri, Apr 15, 2016 at 11:07:16AM -0700, Linus Torvalds wrote:
 > On Fri, Apr 15, 2016 at 9:49 AM, Dave Jones <davej@...emonkey.org.uk> wrote:
 > >  [<ffffffff811d7b39>] ? seq_vprintf+0x39/0x70
 > >  [<ffffffff811d7b35>] seq_vprintf+0x35/0x70
 > > Code: 89 cd 49 01 fc 0f 82 18 03 00 00 48 89 7d b0 41 0f b6 07 0f 1f 84 00 00 00 00 00 84 c0 74 43 48 8d 75 c8 4c 89 ff e8 30 d4 ff ff <0f> b6 55 c8 48 63 c8 4d 8d 34 0f 80 fa 07 0f 87 4c 02 00 00 ff
 > 
 > The code disassembles to
 > 
 >    0: 48 89 7d b0           mov    %rdi,-0x50(%rbp)
 >    4: 41 0f b6 07           movzbl (%r15),%eax
 >    8: 0f 1f 84 00 00 00 00 nopl
 >   10: 84 c0                 test   %al,%al
 >   12: 74 43                 je     0x57
 >   14: 48 8d 75 c8           lea    -0x38(%rbp),%rsi
 >   18: 4c 89 ff             mov    %r15,%rdi
 >   1b: e8 30 d4 ff ff       callq  0xffffffffffffd450
 >   20:* 0f b6 55 c8           movzbl -0x38(%rbp),%edx <-- trapping instruction
 >   24: 48 63 c8             movslq %eax,%rcx
 > 
 > which is interesting. That "-0x38(%rbp)" was passed (by reference) to
 > some subroutine, and now that we try to read the value, we take a
 > fault.
 > 
 > And it makes even less sense because %rbp really seems to be not a
 > random register, but the frame pointer:
 > 
 >   RBP: ffff8801ac52fc78
 >   RSP: ffff8801ac52fc08
 > 
 > So why the *hell* do we get
 > 
 >   BUG: unable to handle kernel NULL pointer dereference at 0000000000000019
 > 
 > for that? That makes no sense.

That's a really good question.

 > Quite frankly, I would not attribute this to /proc/pid/status with
 > this kind of insane oops.
 > 
 > Maybe I misread your oops, but that just all looks completely bogus.
 > Even if the stack got corrupted and/or unmapped, how did %cr2 get that
 > odd "0000000000000019" fault address? None of this makes any sense at
 > all to me.
 > 
 > What CPU is this on? There was the crazy AMD microcode bug. This looks
 > even more random, because now the registers look fine, and the oops
 > just looks bad.

It's a Xeon E5-2680 v4.

 > Do you have other versions of the oops for this same problem?

I've seen this a few times, so I'll see if I can dig up some more next week.

I'm now wondering if it's just some hardware bug though. As mentioned it's
an outlier bug, but one that seems to pop up enough times that it's been
nagging at me, in case it's also responsible for similar weird /proc traces
I've been seeing (more frequently), those have a different signature to this
though.

To put my mind at rest though, am I wrong about that absent task_lock() stuff ?

	Dave

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ