linux-kernel - Re: [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <1df5b870-d4bb-46c2-8e6e-af7b63ba21cc@default>
Date:	Sun, 6 Oct 2013 15:14:23 -0700 (PDT)
From:	Boris Ostrovsky <boris.ostrovsky@...cle.com>
To:	<torvalds@...ux-foundation.org>
Cc:	<xen-devel@...ts.xenproject.org>, <fengguang.wu@...el.com>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC


----- torvalds@...ux-foundation.org wrote:

> On Sun, Oct 6, 2013 at 1:23 AM, Fengguang Wu <fengguang.wu@...el.com>
> wrote:
> >
> > I got the below dmesg and the first bad commit is commit
> cf39c8e5352b:
> >     Merge tag 'stable/for-linus-3.12-rc0-tag' of
> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
> 
> Ugh. How reliable is the double fault? Because bisecting it to the
> merge that didn't even have any conflicts in it as far as I can
> remember means that there's something really subtle going on wrt some
> semantic conflict or other. Or, alternatively, it means that the
> bisect failed because the double fault isn't 100% reliable..
> 
> Anyway, the stack is crap when the original fault happens at
> "boot_tvec_bases+0x1fe", and that causes the double fault debug code
> to take *another* fault, which means that it doesn't even show the
> right code sequence. Too bad. So ignore the latter part of the oops,
> but the top part looks valid:
> 
> > [    4.136137] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [    4.137521] CPU: 0 PID: 132 Comm: bootlogd Not tainted
> 3.12.0-rc2-00153-g14951f2 #129
> > [    4.139156] task: ffff88000c9a6580 ti: ffff88000c9ba000 task.ti:
> ffff88000c9ba000
> > [    4.140042] RIP: 0010:[<ffffffff81f31c7e>]  [<ffffffff81f31c7e>]
> boot_tvec_bases+0x1fe/0x2080
> > [    4.140042] RSP: 0018:0000000088000cd8  EFLAGS: 00010212
> > [    4.140042] RAX: 000000000000004f RBX: 0000000000000100 RCX:
> 0000000000000000
> > [    4.140042] RDX: 0000000000000f1e RSI: ffffffff81f746a8 RDI:
> ffffffff81f31c48
> > [    4.140042] RBP: ffff88000f003ee0 R08: 0000000000000000 R09:
> 0000000000000000
> > [    4.140042] R10: 0000000000000001 R11: ffff88000f00a000 R12:
> ffff88000c9bbfd8
> > [    4.140042] R13: ffffffff81f31c48 R14: ffffffff81f31c48 R15:
> ffffffff81f31c48
> > [    4.140042] FS:  00007fb1f9662700(0000) GS:ffff88000f000000(0000)
> knlGS:0000000000000000
> > [    4.140042] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    4.140042] CR2: 0000000088000cc8 CR3: 000000000c9cd000 CR4:
> 00000000000006b0
> > [    4.140042] Stack:
> <boom, it crashes again here>
> 
> but it has jumped into a data section and is executing random data as
> code, and there is no sign of where it jumped *from*, since the
> random
> code clearly corrupted the stack - resulting in the double fault in
> the first place.
> 
> So the oops is almost entirely useless as a debug aid in this
> situation. I'm almost hoping that your bisect was wrong, and you
> could
> try to see if you could do that again..


For what it's worth, the commit in question touches almost exclusively
Xen files, the only exception being lib/swiotlb.c (with what appear
to be fairly trivial changes). And CONFIG_XEN in the config file for 
this report is not set.


-boris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/