linux-kernel - Re: [Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <de621b0b-f767-222a-e1de-aabd0e9a0bf9@oracle.com>
Date:   Tue, 17 Oct 2017 09:58:59 -0400
From:   Boris Ostrovsky <boris.ostrovsky@...cle.com>
To:     Josh Poimboeuf <jpoimboe@...hat.com>
Cc:     Andrew Cooper <andrew.cooper3@...rix.com>,
        Juergen Gross <jgross@...e.com>,
        Rusty Russell <rusty@...tcorp.com.au>,
        Mike Galbraith <efault@....de>, xen-devel@...ts.xenproject.org,
        Peter Zijlstra <peterz@...radead.org>,
        Jiri Slaby <jslaby@...e.cz>, x86@...nel.org,
        linux-kernel@...r.kernel.org,
        Sasha Levin <alexander.levin@...izon.com>,
        Chris Wright <chrisw@...s-sol.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andy Lutomirski <luto@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
        live-patching@...r.kernel.org, Alok Kataria <akataria@...are.com>,
        virtualization@...ts.linux-foundation.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives
 infrastructure

On 10/17/2017 01:24 AM, Josh Poimboeuf wrote:
> On Mon, Oct 16, 2017 at 02:18:48PM -0400, Boris Ostrovsky wrote:
>> On 10/12/2017 03:53 PM, Boris Ostrovsky wrote:
>>> On 10/12/2017 03:27 PM, Andrew Cooper wrote:
>>>> On 12/10/17 20:11, Boris Ostrovsky wrote:
>>>>> There is also another problem:
>>>>>
>>>>> [    1.312425] general protection fault: 0000 [#1] SMP
>>>>> [    1.312901] Modules linked in:
>>>>> [    1.313389] CPU: 0 PID: 1 Comm: init Not tainted 4.14.0-rc4+ #6
>>>>> [    1.313878] task: ffff88003e2c0000 task.stack: ffffc9000038c000
>>>>> [    1.314360] RIP: 10000e030:entry_SYSCALL_64_fastpath+0x1/0xa5
>>>>> [    1.314854] RSP: e02b:ffffc9000038ff50 EFLAGS: 00010046
>>>>> [    1.315336] RAX: 000000000000000c RBX: 000055f550168040 RCX:
>>>>> 00007fcfc959f59a
>>>>> [    1.315827] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>>>> 0000000000000000
>>>>> [    1.316315] RBP: 000000000000000a R08: 000000000000037f R09:
>>>>> 0000000000000064
>>>>> [    1.316805] R10: 000000001f89cbf5 R11: ffff88003e2c0000 R12:
>>>>> 00007fcfc958ad60
>>>>> [    1.317300] R13: 0000000000000000 R14: 000055f550185954 R15:
>>>>> 0000000000001000
>>>>> [    1.317801] FS:  0000000000000000(0000) GS:ffff88003f800000(0000)
>>>>> knlGS:0000000000000000
>>>>> [    1.318267] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [    1.318750] CR2: 00007fcfc97ab218 CR3: 000000003c88e000 CR4:
>>>>> 0000000000042660
>>>>> [    1.319235] Call Trace:
>>>>> [    1.319700] Code: 51 50 57 56 52 51 6a da 41 50 41 51 41 52 41 53 48
>>>>> 83 ec 30 65 4c 8b 1c 25 c0 d2 00 00 41 f7 03 df 39 08 90 0f 85 a5 00 00
>>>>> 00 50 <ff> 15 9c 95 d0 ff 58 48 3d 4c 01 00 00 77 0f 4c 89 d1 ff 14 c5
>>>>> [    1.321161] RIP: entry_SYSCALL_64_fastpath+0x1/0xa5 RSP: ffffc9000038ff50
>>>>> [    1.344255] ---[ end trace d7cb8cd6cd7c294c ]---
>>>>> [    1.345009] Kernel panic - not syncing: Attempted to kill init!
>>>>> exitcode=0x0000000b
>>>>>
>>>>>
>>>>> All code
>>>>> ========
>>>>>    0:    51                       push   %rcx
>>>>>    1:    50                       push   %rax
>>>>>    2:    57                       push   %rdi
>>>>>    3:    56                       push   %rsi
>>>>>    4:    52                       push   %rdx
>>>>>    5:    51                       push   %rcx
>>>>>    6:    6a da                    pushq  $0xffffffffffffffda
>>>>>    8:    41 50                    push   %r8
>>>>>    a:    41 51                    push   %r9
>>>>>    c:    41 52                    push   %r10
>>>>>    e:    41 53                    push   %r11
>>>>>   10:    48 83 ec 30              sub    $0x30,%rsp
>>>>>   14:    65 4c 8b 1c 25 c0 d2     mov    %gs:0xd2c0,%r11
>>>>>   1b:    00 00
>>>>>   1d:    41 f7 03 df 39 08 90     testl  $0x900839df,(%r11)
>>>>>   24:    0f 85 a5 00 00 00        jne    0xcf
>>>>>   2a:    50                       push   %rax
>>>>>   2b:*    ff 15 9c 95 d0 ff        callq  *-0x2f6a64(%rip)        #
>>>>> 0xffffffffffd095cd        <-- trapping instruction
>>>>>   31:    58                       pop    %rax
>>>>>   32:    48 3d 4c 01 00 00        cmp    $0x14c,%rax
>>>>>   38:    77 0f                    ja     0x49
>>>>>   3a:    4c 89 d1                 mov    %r10,%rcx
>>>>>   3d:    ff                       .byte 0xff
>>>>>   3e:    14 c5                    adc    $0xc5,%al
>>>>>
>>>>>
>>>>> so the original 'cli' was replaced with the pv call but to me the offset
>>>>> looks a bit off, no? Shouldn't it always be positive?
>>>> callq takes a 32bit signed displacement, so jumping back by up to 2G is
>>>> perfectly legitimate.
>>> Yes, but
>>>
>>> ostr@...kbase> nm vmlinux | grep entry_SYSCALL_64_fastpath
>>> ffffffff817365dd t entry_SYSCALL_64_fastpath
>>> ostr@...kbase> nm vmlinux | grep " pv_irq_ops"
>>> ffffffff81c2dbc0 D pv_irq_ops
>>> ostr@...kbase>
>>>
>>> so pv_irq_ops.irq_disable is about 5MB ahead of where we are now. (I
>>> didn't mean that x86 instruction set doesn't allow negative
>>> displacement, I was trying to say that pv_irq_ops always live further down)
>> I believe the problem is this:
>>
>> #define PV_INDIRECT(addr)       *addr(%rip)
>>
>> The displacement that the linker computes will be relative to the where
>> this instruction is placed at the time of linking, which is in
>> .pv_altinstructions (and not .text). So when we copy it into .text the
>> displacement becomes bogus.
> apply_alternatives() is supposed to adjust that displacement based on
> the new IP, though it could be messing that up somehow.  (See patch
> 10/13.)
>

That patch doesn't take into account the fact that replacement
instructions may have to save/restore registers. So, for example,


-        if (a->replacementlen && is_jmp(replacement[0]))
+        } else if (a->replacementlen == 6 && *insnbuf == 0xff &&
+               *(insnbuf+1) == 0x15) {
+            /* indirect call */
+            *(s32 *)(insnbuf + 2) += replacement - instr;
+            DPRINTK("Fix indirect CALL offset: 0x%x, CALL *0x%lx",
+                *(s32 *)(insnbuf + 2),
+                (unsigned long)instr + *(s32 *)(insnbuf + 2) + 6);
+

doesn't do the adjustment of

  2a:    50                       push   %rax
  2b:*    ff 15 9c 95 d0 ff        callq  *-0x2f6a64(%rip)
  31:    58                       pop    %rax

because instbuf points to 'push' and not to 'call'.

-boris