linux-kernel - Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <s5hmw33lfl8.wl-tiwai@suse.de>
Date:	Mon, 23 Mar 2015 19:43:31 +0100
From:	Takashi Iwai <tiwai@...e.de>
To:	Denys Vlasenko <dvlasenk@...hat.com>
Cc:	Andy Lutomirski <luto@...capital.net>,
	Denys Vlasenko <vda.linux@...glemail.com>,
	Jiri Kosina <jkosina@...e.cz>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Stefan Seyfried <stefan.seyfried@...glemail.com>,
	X86 ML <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	Tejun Heo <tj@...nel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

At Mon, 23 Mar 2015 18:46:45 +0100,
Denys Vlasenko wrote:
> 
> On 03/23/2015 06:18 PM, Takashi Iwai wrote:
> > At Mon, 23 Mar 2015 17:07:15 +0100, Denys Vlasenko wrote:
> >>>> I pulled tip tree on top of 4.0-rc5, built with your patch and now
> >>>> succeeded to get a better message:
> >>>>
> >>>>  kvm: zapping shadow pages for mmio generation wraparound
> >>>>  kvm [5126]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xffff
> >>>>  Exception on user stack 00007ffd22c23ef0: RSP: 0018:00007ffd22c23f28  EFLAGS: 00010006
> >>>>  RIP: 0010:[<ffffffff8162681d>]  [<ffffffff8162681d>] netlink_attachskb+0x1d/0x1d0
> >>>>  PANIC: double fault, error_code: 0x0
> >>>>  CPU: 1 PID: 10819 Comm: cc1 Tainted: G        W       4.0.0-rc5-debug1+ #2
> >>>>  Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
> >>>>  task: ffff8800d1b34b10 ti: ffff8800d1b30000 task.ti: ffff8800d1b30000
> >>>>  RIP: 0010:[<ffffffff8162681d>]  [<ffffffff8162681d>] netlink_attachskb+0x1d/0x1d0
> >>>>  RSP: 0018:00007ffd22c23f28  EFLAGS: 00010006
> >>>>  RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00000000c0000101
> >>>>  RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00007ffd22c23ef0
> 
> >> FYI: the disassembly of netlink_attachskb (from "Code:" line) is:
> >>
> >>    0:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> >>    5:   55                      push   %rbp
> >>    6:   48 89 e5                mov    %rsp,%rbp
> >>    9:   41 56                   push   %r14
> >>    b:   41 55                   push   %r13
> >>    d:   49 89 d5                mov    %rdx,%r13
> >>   10:   41 54                   push   %r12
> >>   12:   49 89 f4                mov    %rsi,%r12
> >>   15:   53                      push   %rbx
> >>   16:   48 89 fb                mov    %rdi,%rbx
> >>   19:   48 83 ec 30             sub    $0x30,%rsp
> >>   1d:   8b 87 68 01 00 00       mov    0x168(%rdi),%eax
> >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>   23:   39 87 9c 01 00 00       cmp    %eax,0x19c(%rdi)
> >>   29:   7c 25                   jl     50 <_start+0x50>
> >>   2b:   48 8b 87 88 04 00 00    mov    0x488(%rdi),%rax
> >>
> >> The ^^^^^ instruction is the one which faults. Since you said it
> >> consistently happens here, this should be a page fault, not an external
> >> hardware interrupt.
> >>
> >> The code corresponds to the comparison in if():
> >>
> >> int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
> >>                       long *timeo, struct sock *ssk)
> >> {
> >>         struct netlink_sock *nlk;
> >>
> >>         nlk = nlk_sk(sk);
> >>
> >>         if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
> 
> >>> - Another piece is that the bug happens only when a KVM is running.
> >>>   The kernel ran without problem over days with similar tasks
> >>>   (compiling kernel, etc) when no KVM was used.
> >>
> >> Conceivably virtualization support in CPUs can have nasty erratas.
> >> However, you and other reporter have different CPUs - yours
> >> is Ivy Bridge, his CPU is a Penryn.
> >>
> >> I don't see the path how KVM helps to trigger this.
> >>
> >>> - And now I get the trace as above, pointing netlink_attachskb().
> >>>
> >>> I have a difficulty to imagine how all these pieces fit into a single
> >>> picture.  Is something already screwed up before that?
> >>
> >> Well, a tiny bit more info will be seen if you'd change %rdi
> >> to, say, %r15 in these two lines in my patch:
> >>
> >>        /* Save bogus RSP value */
> >>        movq    %rsp,%rdi
> >> ...
> >>        push    %rdi            /* pt_regs->sp */
> >>
> >> Then original %rdi will be visible in the crash message.
> > 
> > OK, here we go.
> > 
> >  kvm: zapping shadow pages for mmio generation wraparound
> >  kvm [5490]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xffff
> >  Exception on user stack 00007fff1d7e5ec0: RSP: 0018:00007fff1d7e5ef8  EFLAGS: 00010002
> >  RIP: 0010:[<ffffffff8162681d>]  [<ffffffff8162681d>] netlink_attachskb+0x1d/0x1d0
> >  PANIC: double fault, error_code: 0x0
> >  CPU: 5 PID: 14285 Comm: fixdep Tainted: G        W       4.0.0-rc5-debug1+ #3
> >  Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
> >  task: ffff88020ba1c690 ti: ffff880206ba4000 task.ti: ffff880206ba4000
> >  RIP: 0010:[<ffffffff8162681d>]  [<ffffffff8162681d>] netlink_attachskb+0x1d/0x1d0
> >  RSP: 0018:00007fff1d7e5ef8  EFLAGS: 00010002
> >  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000c0000101
> >  RDX: 0000000000000000 RSI: 0000000000001ebb RDI: 0000000000000000
> 
> Thanks for your testing. So the %rdi was NULL... not very informative.
> 
> Notice that your every crash is preceded by
> 
>     kvm: zapping shadow pages for mmio generation wraparound
>     kvm [5490]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xffff
> 
> This hints that kvm _is_ somehow responsible.

It's likely irrelevant, as this appears at the time a VM starting, not
at the crash time.  I've got this message all the time.  Sorry for
confusing.


Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/