linux-kernel - Re: [Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ed30216f-4b9e-2bc3-b1d2-36135b29d746@citrix.com>
Date:   Thu, 12 Oct 2017 20:27:37 +0100
From:   Andrew Cooper <andrew.cooper3@...rix.com>
To:     Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>
CC:     Juergen Gross <jgross@...e.com>,
        Rusty Russell <rusty@...tcorp.com.au>,
        Mike Galbraith <efault@....de>,
        <xen-devel@...ts.xenproject.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Jiri Slaby <jslaby@...e.cz>, <x86@...nel.org>,
        <linux-kernel@...r.kernel.org>,
        Sasha Levin <alexander.levin@...izon.com>,
        Chris Wright <chrisw@...s-sol.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andy Lutomirski <luto@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
        <live-patching@...r.kernel.org>,
        Alok Kataria <akataria@...are.com>,
        <virtualization@...ts.linux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives
 infrastructure

On 12/10/17 20:11, Boris Ostrovsky wrote:
> On 10/06/2017 10:32 AM, Josh Poimboeuf wrote:
>> On Thu, Oct 05, 2017 at 04:35:03PM -0400, Boris Ostrovsky wrote:
>>>>  #ifdef CONFIG_PARAVIRT
>>>> +/*
>>>> + * Paravirt alternatives are applied much earlier than normal alternatives.
>>>> + * They are only applied when running on a hypervisor.  They replace some
>>>> + * native instructions with calls to pv ops.
>>>> + */
>>>> +void __init apply_pv_alternatives(void)
>>>> +{
>>>> +	setup_force_cpu_cap(X86_FEATURE_PV_OPS);
>>> Not for Xen HVM guests.
>> From what I can tell, HVM guests still use pv_time_ops and
>> pv_mmu_ops.exit_mmap, right?
>>
>>>> +	apply_alternatives(__pv_alt_instructions, __pv_alt_instructions_end);
>>>> +}
>>> This is a problem (at least for Xen PV guests):
>>> apply_alternatives()->text_poke_early()->local_irq_save()->...'cli'->death.
>> Ah, right.
>>
>>> It might be possible not to turn off/on the interrupts in this
>>> particular case since the guest probably won't be able to handle an
>>> interrupt at this point anyway.
>> Yeah, that should work.  For Xen and for the other hypervisors, this is
>> called well before irq init, so interrupts can't be handled yet anyway.
> There is also another problem:
>
> [    1.312425] general protection fault: 0000 [#1] SMP
> [    1.312901] Modules linked in:
> [    1.313389] CPU: 0 PID: 1 Comm: init Not tainted 4.14.0-rc4+ #6
> [    1.313878] task: ffff88003e2c0000 task.stack: ffffc9000038c000
> [    1.314360] RIP: 10000e030:entry_SYSCALL_64_fastpath+0x1/0xa5
> [    1.314854] RSP: e02b:ffffc9000038ff50 EFLAGS: 00010046
> [    1.315336] RAX: 000000000000000c RBX: 000055f550168040 RCX:
> 00007fcfc959f59a
> [    1.315827] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000000000
> [    1.316315] RBP: 000000000000000a R08: 000000000000037f R09:
> 0000000000000064
> [    1.316805] R10: 000000001f89cbf5 R11: ffff88003e2c0000 R12:
> 00007fcfc958ad60
> [    1.317300] R13: 0000000000000000 R14: 000055f550185954 R15:
> 0000000000001000
> [    1.317801] FS:  0000000000000000(0000) GS:ffff88003f800000(0000)
> knlGS:0000000000000000
> [    1.318267] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.318750] CR2: 00007fcfc97ab218 CR3: 000000003c88e000 CR4:
> 0000000000042660
> [    1.319235] Call Trace:
> [    1.319700] Code: 51 50 57 56 52 51 6a da 41 50 41 51 41 52 41 53 48
> 83 ec 30 65 4c 8b 1c 25 c0 d2 00 00 41 f7 03 df 39 08 90 0f 85 a5 00 00
> 00 50 <ff> 15 9c 95 d0 ff 58 48 3d 4c 01 00 00 77 0f 4c 89 d1 ff 14 c5
> [    1.321161] RIP: entry_SYSCALL_64_fastpath+0x1/0xa5 RSP: ffffc9000038ff50
> [    1.344255] ---[ end trace d7cb8cd6cd7c294c ]---
> [    1.345009] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
>
>
> All code
> ========
>    0:    51                       push   %rcx
>    1:    50                       push   %rax
>    2:    57                       push   %rdi
>    3:    56                       push   %rsi
>    4:    52                       push   %rdx
>    5:    51                       push   %rcx
>    6:    6a da                    pushq  $0xffffffffffffffda
>    8:    41 50                    push   %r8
>    a:    41 51                    push   %r9
>    c:    41 52                    push   %r10
>    e:    41 53                    push   %r11
>   10:    48 83 ec 30              sub    $0x30,%rsp
>   14:    65 4c 8b 1c 25 c0 d2     mov    %gs:0xd2c0,%r11
>   1b:    00 00
>   1d:    41 f7 03 df 39 08 90     testl  $0x900839df,(%r11)
>   24:    0f 85 a5 00 00 00        jne    0xcf
>   2a:    50                       push   %rax
>   2b:*    ff 15 9c 95 d0 ff        callq  *-0x2f6a64(%rip)        #
> 0xffffffffffd095cd        <-- trapping instruction
>   31:    58                       pop    %rax
>   32:    48 3d 4c 01 00 00        cmp    $0x14c,%rax
>   38:    77 0f                    ja     0x49
>   3a:    4c 89 d1                 mov    %r10,%rcx
>   3d:    ff                       .byte 0xff
>   3e:    14 c5                    adc    $0xc5,%al
>
>
> so the original 'cli' was replaced with the pv call but to me the offset
> looks a bit off, no? Shouldn't it always be positive?

callq takes a 32bit signed displacement, so jumping back by up to 2G is
perfectly legitimate.

The #GP[0] however means that whatever 8 byte value was found at
-0x2f6a64(%rip) was a non-canonical address.

One option is that the pvops structure hasn't been initialised properly,
but an alternative is that the relocation wasn't processed correctly,
and the code is trying to reference something which isn't a function
pointer.

~Andrew