linux-kernel - Re: [Xen-devel] [PATCH v2] xen/pv: Fix a boot up hang revealed by int3 self test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <18619ecb-108a-0d89-812c-7525a566e805@suse.com>
Date:   Mon, 15 Jul 2019 06:54:27 +0000
From:   Jan Beulich <JBeulich@...e.com>
To:     Zhenzhong Duan <zhenzhong.duan@...cle.com>
CC:     Andrew Cooper <andrew.cooper3@...rix.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Borislav Petkov <bp@...en8.de>,
        "Peter Zijlstra" <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        "Stefano Stabellini" <sstabellini@...nel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        "srinivas.eeda@...cle.com" <srinivas.eeda@...cle.com>,
        Ingo Molnar <mingo@...hat.com>, Juergen Gross <JGross@...e.com>
Subject: Re: [Xen-devel] [PATCH v2] xen/pv: Fix a boot up hang revealed by
 int3 self test

On 15.07.2019 07:05, Zhenzhong Duan wrote:
> 
> On 2019/7/12 22:06, Andrew Cooper wrote:
>> On 11/07/2019 03:15, Zhenzhong Duan wrote:
>>> Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call()
>>> selftest") is used to ensure there is a gap setup in exception stack
>>> which could be used for inserting call return address.
>>>
>>> This gap is missed in XEN PV int3 exception entry path, then below panic
>>> triggered:
>>>
>>> [    0.772876] general protection fault: 0000 [#1] SMP NOPTI
>>> [    0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11
>>> [    0.772893] RIP: e030:int3_magic+0x0/0x7
>>> [    0.772905] RSP: 3507:ffffffff82203e98 EFLAGS: 00000246
>>> [    0.773334] Call Trace:
>>> [    0.773334]  alternative_instructions+0x3d/0x12e
>>> [    0.773334]  check_bugs+0x7c9/0x887
>>> [    0.773334]  ? __get_locked_pte+0x178/0x1f0
>>> [    0.773334]  start_kernel+0x4ff/0x535
>>> [    0.773334]  ? set_init_arg+0x55/0x55
>>> [    0.773334]  xen_start_kernel+0x571/0x57a
>>>
>>> As xenint3 and int3 entry code are same except xenint3 doesn't generate
>>> a gap, we can fix it by using int3 and drop useless xenint3.
>> For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with
>> %rcx/%r11 on the stack.
>>
>> To convert back to "normal" looking exceptions, the xen thunks do `pop
>> %rcx; pop %r11; jmp do_*`...
> I will add this to commit message.
>>
>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>> index 0ea4831..35a66fc 100644
>>> --- a/arch/x86/entry/entry_64.S
>>> +++ b/arch/x86/entry/entry_64.S
>>> @@ -1176,7 +1176,6 @@ idtentry stack_segment        do_stack_segment    has_error_code=1
>>>   #ifdef CONFIG_XEN_PV
>>>   idtentry xennmi            do_nmi            has_error_code=0
>>>   idtentry xendebug        do_debug        has_error_code=0
>>> -idtentry xenint3        do_int3            has_error_code=0
>>>   #endif
>> What is confusing is why there are 3 extra magic versions here.  At a
>> guess, I'd say something to do with IST settings (given the vectors),
>> but I don't see why #DB/#BP would need to be different in the first
>> place.  (NMI sure, but that is more to do with the crazy hoops needing
>> to be jumped through to keep native functioning safely.)
> 
> xenint3 will be removed in this patch safely as it don't use IST now.
> 
> But debug and nmi need paranoid_entry which will read MSR_GS_BASE to determine
> 
> if swapgs is needed. I guess PV guesting running in ring3 will #GP with swapgs?

Not only that (Xen could trap and emulate swapgs if that was needed) - 64-bit
PV kernel mode always gets entered with kernel GS base already set. Hence
finding out whether to switch the GS base is specifically not something that
any exception entry point would need to do (and it should actively try to
avoid it, for performance reasons).

Jan