linux-kernel - Re: Xen PV seems to be broken on Linus' tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Wed, 22 Nov 2017 07:23:23 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Juergen Gross <jgross@...e.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        X86 ML <x86@...nel.org>
Subject: Re: Xen PV seems to be broken on Linus' tree

On Wed, Nov 22, 2017 at 4:50 AM, Juergen Gross <jgross@...e.com> wrote:
> On 22/11/17 05:46, Andy Lutomirski wrote:
>> On Tue, Nov 21, 2017 at 8:11 PM, Andy Lutomirski <luto@...nel.org> wrote:
>>> On Tue, Nov 21, 2017 at 7:33 PM, Andy Lutomirski <luto@...nel.org> wrote:
>>>> I'm doing:
>>>>
>>>> /usr/bin/qemu-system-x86_64 -machine accel=kvm:tcg -cpu host -net none
>>>> -nographic -kernel xen-4.8.2 -initrd './arch/x86/boot/bzImage' -m 2G
>>>> -smp 2 -append console=com1
>>>>
>>>> With Linus' commit c8a0739b185d11d6e2ca7ad9f5835841d1cfc765 and the
>>>> attached config.
>>>>
>>>> It dies with a bunch of sensible log lines and then:
>>>>
>>>> (XEN) d0v0 Unhandled invalid opcode fault/trap [#6, ec=0000]
>>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023961a
>>>> entry.o#create_bounce_frame+0x137/0x146
>>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>>> (XEN) ----[ Xen-4.8.2  x86_64  debug=n   Not tainted ]----
>>>> (XEN) CPU:    0
>>>> (XEN) RIP:    e033:[<ffffffff811226eb>]
>>>> (XEN) RFLAGS: 0000000000000296   EM: 1   CONTEXT: pv guest (d0v0)
>>>> (XEN) rax: 000000000000002f   rbx: ffffffff81e65a48   rcx: ffffffff81e71288
>>>> (XEN) rdx: ffffffff81e27500   rsi: 0000000000000001   rdi: ffffffff81133f88
>>>> (XEN) rbp: 0000000000000000   rsp: ffffffff81e03e78   r8:  0000000000000000
>>>> (XEN) r9:  0000000000000001   r10: 0000000000000000   r11: 0000000000000000
>>>> (XEN) r12: 0000000000000000   r13: 0000000000000001   r14: 0000000000000001
>>>> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000003506e0
>>>> (XEN) cr3: 000000007b0b3000   cr2: 0000000000000000
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
>>>> (XEN) Guest stack trace from rsp=ffffffff81e03e78:
>>>> (XEN)    ffffffff81e71288 0000000000000000 ffffffff811226eb 000000010000e030
>>>> (XEN)    0000000000010096 ffffffff81e03eb8 000000000000e02b ffffffff811226eb
>>>> (XEN)    ffffffff81122c2e 0000000000000200 0000000000000000 0000000000000000
>>>> (XEN)    0000000000000030 ffffffff81c69cf5 ffffffff81080b20 ffffffff81080560
>>>> (XEN)    0000000000000000 ffffffff810d3741 ffffffff8107b420 ffffffff81094660
>>>>
>>>> Is this familiar?
>>>>
>>>> I'll feel really dumb if it ends up being my fault.
>>>
>>> Nah, it's broken at least back to v4.13, and I suspect it's config
>>> related.  objdump gives me this:
>>>
>>> ffffffff8112b0e1:       e9 e8 fe ff ff          jmpq
>>> ffffffff8112afce <check_flags.part.42+0x4e>
>>> ffffffff8112b0e6:       48 c7 c6 2d f8 c8 81    mov    $0xffffffff81c8f82d,%rsi
>>> ffffffff8112b0ed:       48 c7 c7 58 b9 c8 81    mov    $0xffffffff81c8b958,%rdi
>>> ffffffff8112b0f4:       e8 13 2d 01 00          callq  ffffffff8113de0c <printk>
>>> ffffffff8112b0f9:       0f ff                   (bad)   <-- crash here
>>>
>>> That's "ud0", which is used by WARN.  So we're probably hitting an
>>> early warning and Xen probably has something busted with early
>>> exception handling.
>>>
>>> Anyone want to debug it and fix it?
>>
>> Well, I think I debugged it.  x86_64 has a shiny function
>> idt_setup_early_handler(), and Xen doesn't call it.  Fixing the
>> problem may be as simple as calling it at an appropriate time and
>> doing whatever asm magic is needed to deal with Xen's weird IDT
>> calling convention.
>
> Hmm, yes, this should work. I'll have a try.
>
> BTW: I don't think this ever worked.
>

The ud0 trick itself is fairly recent, so old enough kernels (4.10?  I
don't really remember) wouldn't die just because of an early warning.