linux-kernel - Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 18 Mar 2015 17:57:46 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Stefan Seyfried <stefan.seyfried@...glemail.com>
Cc:	Jiri Kosina <jkosina@...e.cz>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Takashi Iwai <tiwai@...e.de>, X86 ML <x86@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

On Wed, Mar 18, 2015 at 5:23 PM, Stefan Seyfried
<stefan.seyfried@...glemail.com> wrote:
> Am 19.03.2015 um 00:22 schrieb Andy Lutomirski:
>> On Wed, Mar 18, 2015 at 3:40 PM, Andy Lutomirski <luto@...capital.net> wrote:
>>> Yes, it's userspace.  Thanks for checking, though.
>>
>> One more stupid hunch:
>>
>> Can you do:
>> x/21xg ffff8801013d4f58
>>
>> If I counted right, that'll dump task_pt_regs(current).
>
> That's all zeroes:
> crash> x /21xg 0xffff8801013d4f58
> 0xffff8801013d4f58:     0x0000000000000000      0x0000000000000000
> 0xffff8801013d4f68:     0x0000000000000000      0x0000000000000000
> 0xffff8801013d4f78:     0x0000000000000000      0x0000000000000000
> 0xffff8801013d4f88:     0x0000000000000000      0x0000000000000000
> 0xffff8801013d4f98:     0x0000000000000000      0x0000000000000000
> 0xffff8801013d4fa8:     0x0000000000000000      0x0000000000000000
> 0xffff8801013d4fb8:     0x0000000000000000      0x0000000000000000
> 0xffff8801013d4fc8:     0x0000000000000000      0x0000000000000000
> 0xffff8801013d4fd8:     0x0000000000000000      0x0000000000000000
> 0xffff8801013d4fe8:     0x0000000000000000      0x0000000000000000
> 0xffff8801013d4ff8:     0x0000000000000000
>
> But maybe you counted wrong (or I'm reading arch/x86/include/asm/processor.h wrong, which is at least as likely...).
>
> #define task_pt_regs(tsk)  ((struct pt_regs *)(tsk)->thread.sp0 - 1)
>
> => I have the task_struct readily available decoded in the crash utility.
>
> crash> task, search for thread, in thread:
>      sp0 = 18446612136629993472
> crash> eval 18446612136629993472
> hexadecimal: ffff8801013d8000  (18014269664677728KB)

I did indeed count wrong -- THREAD_SIZE != 0x1000.  Whoops.

> ....
> crash> print *(struct pt_regs *)(18446612136629993472 - sizeof(struct pt_regs))

Looks like we last entered via an io_submit syscall.

> $20 = {
>   r15 = 18446744071585666077,
>   r14 = 16,
>   r13 = 582,
>   r12 = 18446612136629993352,
>   bp = 24,
>   bx = 18446744071585666061,
>   r11 = 582,

==flags, which is consistent with a syscall.  However, Denys' big
cleanup isn't in play here, so we probably did FIXUP_TOP_OF_STACK,
maybe even in the syscall in question.

>   r10 = 10760856,
>   r9 = 140712613762160,
>   r8 = 140735967861216,
>   ax = 1,

Entirely resonable if we're trying to exit from io_submit.

>   cx = 140712476030103,

0x7ffa2d263497

>   dx = 140712613782304,
>   si = 1,
>   di = 140712589295616,
>   orig_ax = 209,

__NR_io_submit

>   ip = 140712571864823,

0x7ffa32dc86f7, which is not equal to cx (oddly, given that this seems
to have been a syscall) and is canonical.  To me, this suggests that
FIXUP_TOP_OF_STACK last executed on a different syscall, in which case
all this opportunistic sysret stuff is a red herring - we never
executed FIXUP_TOP_OF_STACK for this syscall.

>   cs = 51,

__USER_CS

>   flags = 582,

0x246 (i.e. totally normal for userspace, I think)

>   sp = 140735967860552,

0x7fffa55f1748

Note that the double fault happened with rsp == 0x00007fffa55eafb8,
which is the saved rsp here - 0x6790.  That difference kind of large
to make sense if this is a sysret problem.  Not that I have a better
explanation...

OTOH, if it's a syscall problem, then these regs are from the previous
syscall, so 0x6790 byts of additional user stack usage is entirely
sensible.  Alternatively, we could have taken a whole pile of nested
page faults until we crossed into the land of unwritable user stack
pages.

>   ss = 43

__USER_DS

> }
>
> =>
> r15 = ffffffff8168141d
> r12 = ffff8801013d7f88
> bx  = ffffffff8168140d
> r9  = 7ffa355bd470
> ip  = 7ffa32dc86f7
> sp  = 7fffa55f1748
>
> looks somehow legit, to my totally untrained eye (ip and sp actually).

One potentially interesting thing that changed is that we now return
from KVM to userspace (to the next scheduled task, not necessarily to
the run ioctl) via sysret *even if the user return notifier runs*.
This was part of the point of the opportunistic sysret code, and KVM
seems to be involved here.

>
> I'm off to bed now (01:20 around here ;), will be back in about 7 hours.

Thanks for the evening debugging help :)

FWIW, I just noticed that stub_execveat incorrect calls
RESTORE_TOP_OF_STACK before jumping to int_ret_from_sys_call.
Actually, there seems to be an impressive number of bugs like that
(the syscall slow path totally screws this up, but it seems harmless
to me).  I'm really glad that Denys is removing that code...

Stefan, do you happen to know whether your disassembly of page_fault
came from the instructions in memory or if they came from the vmlinux
file?  Not that I have any relevant ideas there.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/