linux-kernel - Re: [x86] BUG: unable to handle kernel paging request at 00740060

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131008075151.GA15689@localhost>
Date:	Tue, 8 Oct 2013 15:51:51 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Oleg Nesterov <oleg@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [x86] BUG: unable to handle kernel paging request at 00740060

Hi Linus,

On Mon, Oct 07, 2013 at 11:47:39AM -0700, Linus Torvalds wrote:
> On Sat, Oct 5, 2013 at 4:44 PM, Fengguang Wu <fengguang.wu@...el.com> wrote:
> >
> > I got the below dmesg and the first bad commit is
> >
> > commit 0c44c2d0f459 ("x86: Use asm goto to implement better modify_and_test() functions"
> 
> Hmm. I'm looking at the final version of that patch, and I'm not
> seeing anything wrong. It may trigger a compiler bug - there aren't
> that many "asm goto" users, and using them for the bitops adds a lot
> of new cases.
> 
> Your oops makes very little sense, it looks like task_work_run() just
> called out to random crap, probably because the work was already
> released, so "work->func()" ends up being bad. I'm adding Oleg to the
> participants anyway, just in case there is some race. The comment says
> that it can race with task_work_cancel() playing with *work. Oleg,
> comments?
> 
> However, I don't see any actual bit-op code in task_work_run() itself,
> so it's something else that got miscompiled and corrupted memory. In
> that respect, the oops you have looks more like the oopses you got
> with DEBUG_KOBJECT_RELEASE. Are you sure that wasn't set?

The options was set:

DEBUG_KOBJECT_RELEASE=y
 
I tried disabled it, and find the error still remains:

[    9.719060] Write protecting the kernel text: 6116k
[    9.720356] Write protecting the kernel read-only data: 2616k
[    9.721586] NX-protecting the kernel data: 6172k
[    9.750420] BUG: unable to handle kernel NULL pointer dereference at   (null)
[    9.750870] IP: [<  (null)>]   (null)
[    9.750870] *pdpt = 00000000072be001 *pde = 0000000000000000
[    9.750870] Oops: 0010 [#1] DEBUG_PAGEALLOC
[    9.750870] CPU: 0 PID: 84 Comm: rc.local Not tainted 3.12.0-rc1-00081-g6bfa687 #4
[    9.750870] task: 872c4000 ti: 872c6000 task.ti: 872c6000
[    9.750870] EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
[    9.750870] EIP is at 0x0
[    9.750870] EAX: 82076134 EBX: 872b2780 ECX: 00000000 EDX: 82076134
[    9.750870] ESI: 872c4000 EDI: 872c4388 EBP: 872c7f9c ESP: 872c7f8c
[    9.750870]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    9.750870] CR0: 8005003b CR2: 00000000 CR3: 072bd000 CR4: 000006b0
[    9.750870] Stack:
[    9.750870]  810545b9 00000001 789ecf58 7767dff4 872c7fac 81002358 00000000 78a03903
[    9.750870]  872c6000 815f6bd0 00000000 00000000 00000000 00000000 00000000 00000000
[    9.750870]  00000000 0000007b 0000007b 00000000 00000000 0000000b 777d81d0 00000073
[    9.750870] Call Trace:
[    9.750870]  [<810545b9>] ? task_work_run+0x79/0xb0
[    9.750870]  [<81002358>] do_notify_resume+0x58/0x70
[    9.750870]  [<815f6bd0>] work_notifysig+0x2b/0x3b
[    9.750870] Code:  Bad EIP value.
[    9.750870] EIP: [<00000000>] 0x0 SS:ESP 0068:872c7f8c
[    9.750870] CR2: 0000000000000000
[    9.769399] ---[ end trace da54692b95c91495 ]---
[    9.777566] BUG: unable to handle kernel paging request at 05140060
[    9.778845] IP: [<81054594>] task_work_run+0x54/0xb0
[    9.779774] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
[    9.780708] Oops: 0000 [#2] DEBUG_PAGEALLOC
[    9.781431] CPU: 0 PID: 85 Comm: hostname Tainted: G      D      3.12.0-rc1-00081-g6bfa687 #4
[    9.781721] task: 872c8000 ti: 872ca000 task.ti: 872ca000
[    9.781721] EIP: 0060:[<81054594>] EFLAGS: 00010206 CPU: 0
[    9.781721] EIP is at task_work_run+0x54/0xb0
[    9.781721] EAX: 05140060 EBX: 8729b900 ECX: 00000000 EDX: 05140060
[    9.781721] ESI: 872c8000 EDI: 872c8388 EBP: 872cbf3c ESP: 872cbf30
[    9.781721]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    9.781721] CR0: 8005003b CR2: 05140060 CR3: 072cc000 CR4: 000006b0
[    9.781721] Stack:
[    9.781721]  ffffffff 872af400 872c8000 872cbf8c 8103a02a 00000014 776cefb8 8105b49b
[    9.781721]  00000000 872cbfac 00000001 00000015 61636f6c 736f686c 6f6c2e74 872af458
[    9.781721]  69616d6f 872af46e 872af458 00000000 00000000 872ae980 872c8000 872cbfa4
[    9.781721] Call Trace:
[    9.781721]  [<8103a02a>] do_exit+0x2aa/0x920
[    9.781721]  [<8105b49b>] ? up_write+0x1b/0x30
[    9.781721]  [<8103a732>] do_group_exit+0x52/0xb0
[    9.781721]  [<8103a7a8>] SyS_exit_group+0x18/0x20
[    9.781721]  [<815f7130>] sysenter_do_call+0x12/0x3c
[    9.781721] Code: 36 31 c9 89 d0 0f b1 0f 39 c2 75 eb 85 d2 74 67 8d b4 26 00 00 00 00 f3 90 8b 86 c0 03 00 00 85 c0 74 f4 31 db eb 04 89 d3 89 c2 <8b> 02 89 1a 85 c0 75 f4 eb 16 66 90 f6 46 0c 04 74 c4 b9 04 d0
[    9.781721] EIP: [<81054594>] task_work_run+0x54/0xb0 SS:ESP 0068:872cbf30
[    9.781721] CR2: 0000000005140060
[    9.802246] ---[ end trace da54692b95c91496 ]---
[    9.802881] Fixing recursive fault but reboot is needed!
[    9.811986] BUG: unable to handle kernel paging request at 0805a000
[    9.812911] IP: [<81054594>] task_work_run+0x54/0xb0
[    9.813683] *pdpt = 00000000072e2001 *pde = 00000000072cf067 *pte = 0000000000000000
[    9.815024] Oops: 0000 [#3] DEBUG_PAGEALLOC
[    9.815623] CPU: 0 PID: 86 Comm: plymouthd Tainted: G      D      3.12.0-rc1-00081-g6bfa687 #4
[    9.816819] task: 872da000 ti: 872dc000 task.ti: 872dc000
[    9.817617] EIP: 0060:[<81054594>] EFLAGS: 00010206 CPU: 0
[    9.818394] EIP is at task_work_run+0x54/0xb0
[    9.819000] EAX: 0805a000 EBX: 872d3060 ECX: 00000000 EDX: 0805a000
[    9.819864] ESI: 872da000 EDI: 872da388 EBP: 872ddf3c ESP: 872ddf30
[    9.820769]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    9.820908] CR0: 8005003b CR2: 0805a000 CR3: 072b8000 CR4: 000006b0
[    9.820908] Stack:
[    9.820908]  00000001 00000405 00000000 872ddf4c 8104738c 872da000 00000001 872ddf94
[    9.820908]  810fb04b 00000002 00000001 00000000 810faf3a 872b92d8 872b9280 00000056
[    9.820908]  00000001 872d3408 00000056 085c82a8 00000000 872da214 00000000 872d2000
[    9.820908] Call Trace:
[    9.820908]  [<8104738c>] ptrace_notify+0x5c/0xa0
[    9.820908]  [<810fb04b>] do_execve+0x5fb/0x6f0
[    9.820908]  [<810faf3a>] ? do_execve+0x4ea/0x6f0
[    9.820908]  [<810fb37c>] SyS_execve+0x5c/0x70
[    9.820908]  [<815f7130>] sysenter_do_call+0x12/0x3c
[    9.820908] Code: 36 31 c9 89 d0 0f b1 0f 39 c2 75 eb 85 d2 74 67 8d b4 26 00 00 00 00 f3 90 8b 86 c0 03 00 00 85 c0 74 f4 31 db eb 04 89 d3 89 c2 <8b> 02 89 1a 85 c0 75 f4 eb 16 66 90 f6 46 0c 04 74 c4 b9 04 d0
[    9.820908] EIP: [<81054594>] task_work_run+0x54/0xb0 SS:ESP 0068:872ddf30
[    9.820908] CR2: 000000000805a000
[    9.836265] ---[ end trace da54692b95c91497 ]---
[    9.842439] BUG: unable to handle kernel paging request at 02c00060
[    9.843426] IP: [<81054594>] task_work_run+0x54/0xb0
[    9.844709] *pdpt = 00000000072c1001 *pde = 0000000000000000

> That said, Fengguang, can you try two things just to check:
> 
>  - add "cc" to the clobbers list for the asm goto (technically it
> should be on the non-asm-goto as well, but we never had that, and
> maybe the fact that gcc always ends up testing a register afterwards
> hides the need for the clobber).
> 
> So it would look like this in arch/x86/include/asm/rmwcc.h
> 
>   #define __GEN_RMWcc(fullop, var, cc, ...) \
>   do { \
>       asm volatile goto (fullop "; j" cc " %l[cc_label]" \
>           : : "m" (var), ## __VA_ARGS__ \
>           : "memory", "cc" : cc_label); \
>       return 0; \
>   cc_label: \
>       return 1; \
> 
> (where that "cc" thing is new). I'm not sure if "cc" really matters on
> x86 at all (it didn't use to, long long ago), but maybe it does these
> days..

Tests show that it makes no difference by adding the "cc" this way:

-                       : "memory" : cc_label);                         \
+                       : "memory", "cc" : cc_label);                           \
 
> If that makes no difference, please just verify that the non-asm-goto
> version works fine, by changing the
> 
>   #ifdef CC_HAVE_ASM_GOTO
> 
> into a simple "#if 0" to disable the asm-goto version.

Yeah, this will quiet the oops messages:

-#ifdef CC_HAVE_ASM_GOTO
+#if 0
 
 #define __GEN_RMWcc(fullop, var, cc, ...)                              \

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/