[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b9fdc3ec-87cd-da0e-47b7-67cdae8ffb97@oracle.com>
Date: Mon, 16 Oct 2017 14:18:48 -0400
From: Boris Ostrovsky <boris.ostrovsky@...cle.com>
To: Andrew Cooper <andrew.cooper3@...rix.com>,
Josh Poimboeuf <jpoimboe@...hat.com>
Cc: Juergen Gross <jgross@...e.com>,
Rusty Russell <rusty@...tcorp.com.au>,
Mike Galbraith <efault@....de>, xen-devel@...ts.xenproject.org,
Peter Zijlstra <peterz@...radead.org>,
Jiri Slaby <jslaby@...e.cz>, x86@...nel.org,
linux-kernel@...r.kernel.org,
Sasha Levin <alexander.levin@...izon.com>,
Chris Wright <chrisw@...s-sol.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andy Lutomirski <luto@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
live-patching@...r.kernel.org, Alok Kataria <akataria@...are.com>,
virtualization@...ts.linux-foundation.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: [Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives
infrastructure
On 10/12/2017 03:53 PM, Boris Ostrovsky wrote:
> On 10/12/2017 03:27 PM, Andrew Cooper wrote:
>> On 12/10/17 20:11, Boris Ostrovsky wrote:
>>> There is also another problem:
>>>
>>> [ 1.312425] general protection fault: 0000 [#1] SMP
>>> [ 1.312901] Modules linked in:
>>> [ 1.313389] CPU: 0 PID: 1 Comm: init Not tainted 4.14.0-rc4+ #6
>>> [ 1.313878] task: ffff88003e2c0000 task.stack: ffffc9000038c000
>>> [ 1.314360] RIP: 10000e030:entry_SYSCALL_64_fastpath+0x1/0xa5
>>> [ 1.314854] RSP: e02b:ffffc9000038ff50 EFLAGS: 00010046
>>> [ 1.315336] RAX: 000000000000000c RBX: 000055f550168040 RCX:
>>> 00007fcfc959f59a
>>> [ 1.315827] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>> 0000000000000000
>>> [ 1.316315] RBP: 000000000000000a R08: 000000000000037f R09:
>>> 0000000000000064
>>> [ 1.316805] R10: 000000001f89cbf5 R11: ffff88003e2c0000 R12:
>>> 00007fcfc958ad60
>>> [ 1.317300] R13: 0000000000000000 R14: 000055f550185954 R15:
>>> 0000000000001000
>>> [ 1.317801] FS: 0000000000000000(0000) GS:ffff88003f800000(0000)
>>> knlGS:0000000000000000
>>> [ 1.318267] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 1.318750] CR2: 00007fcfc97ab218 CR3: 000000003c88e000 CR4:
>>> 0000000000042660
>>> [ 1.319235] Call Trace:
>>> [ 1.319700] Code: 51 50 57 56 52 51 6a da 41 50 41 51 41 52 41 53 48
>>> 83 ec 30 65 4c 8b 1c 25 c0 d2 00 00 41 f7 03 df 39 08 90 0f 85 a5 00 00
>>> 00 50 <ff> 15 9c 95 d0 ff 58 48 3d 4c 01 00 00 77 0f 4c 89 d1 ff 14 c5
>>> [ 1.321161] RIP: entry_SYSCALL_64_fastpath+0x1/0xa5 RSP: ffffc9000038ff50
>>> [ 1.344255] ---[ end trace d7cb8cd6cd7c294c ]---
>>> [ 1.345009] Kernel panic - not syncing: Attempted to kill init!
>>> exitcode=0x0000000b
>>>
>>>
>>> All code
>>> ========
>>> 0: 51 push %rcx
>>> 1: 50 push %rax
>>> 2: 57 push %rdi
>>> 3: 56 push %rsi
>>> 4: 52 push %rdx
>>> 5: 51 push %rcx
>>> 6: 6a da pushq $0xffffffffffffffda
>>> 8: 41 50 push %r8
>>> a: 41 51 push %r9
>>> c: 41 52 push %r10
>>> e: 41 53 push %r11
>>> 10: 48 83 ec 30 sub $0x30,%rsp
>>> 14: 65 4c 8b 1c 25 c0 d2 mov %gs:0xd2c0,%r11
>>> 1b: 00 00
>>> 1d: 41 f7 03 df 39 08 90 testl $0x900839df,(%r11)
>>> 24: 0f 85 a5 00 00 00 jne 0xcf
>>> 2a: 50 push %rax
>>> 2b:* ff 15 9c 95 d0 ff callq *-0x2f6a64(%rip) #
>>> 0xffffffffffd095cd <-- trapping instruction
>>> 31: 58 pop %rax
>>> 32: 48 3d 4c 01 00 00 cmp $0x14c,%rax
>>> 38: 77 0f ja 0x49
>>> 3a: 4c 89 d1 mov %r10,%rcx
>>> 3d: ff .byte 0xff
>>> 3e: 14 c5 adc $0xc5,%al
>>>
>>>
>>> so the original 'cli' was replaced with the pv call but to me the offset
>>> looks a bit off, no? Shouldn't it always be positive?
>> callq takes a 32bit signed displacement, so jumping back by up to 2G is
>> perfectly legitimate.
> Yes, but
>
> ostr@...kbase> nm vmlinux | grep entry_SYSCALL_64_fastpath
> ffffffff817365dd t entry_SYSCALL_64_fastpath
> ostr@...kbase> nm vmlinux | grep " pv_irq_ops"
> ffffffff81c2dbc0 D pv_irq_ops
> ostr@...kbase>
>
> so pv_irq_ops.irq_disable is about 5MB ahead of where we are now. (I
> didn't mean that x86 instruction set doesn't allow negative
> displacement, I was trying to say that pv_irq_ops always live further down)
I believe the problem is this:
#define PV_INDIRECT(addr) *addr(%rip)
The displacement that the linker computes will be relative to the where
this instruction is placed at the time of linking, which is in
.pv_altinstructions (and not .text). So when we copy it into .text the
displacement becomes bogus.
Replacing the macro with
#define PV_INDIRECT(addr) *addr // well, it's not so much
indirect anymore
makes things work. Or maybe it can be adjusted top be kept truly indirect.
-boris
Powered by blists - more mailing lists