linux-kernel - Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrV32APYgOe7a4jOSu8p2siu28XQ+mFqrzxudceFkQmk7w@mail.gmail.com>
Date:	Sun, 6 Mar 2016 09:36:42 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	Brian Gerst <brgerst@...il.com>
Cc:	Borislav Petkov <bp@...en8.de>, Andy Lutomirski <luto@...nel.org>,
	"the arch/x86 maintainers" <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Andrew Cooper <andrew.cooper3@...rix.com>
Subject: Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

On Sun, Mar 6, 2016 at 9:30 AM, Brian Gerst <brgerst@...il.com> wrote:
> On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkov <bp@...en8.de> wrote:
>> On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote:
>>> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
>>> if a user does SYSENTER with TF set, we will single-step through the
>>> kernel until something clears TF.  There is absolutely nothing we can
>>> do to prevent this short of turning off SYSENTER [1].
>>>
>>> Simplify the handling considerably with two changes:
>>>
>>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>>>    add TF to that list of flags to sanitize with no overhead whatsoever.
>>>
>>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>>>
>>> That's all we need to do.
>>>
>>> Don't get too excited -- our handling is still buggy on 32-bit
>>> kernels.  There's nothing wrong with the SYSENTER code itself, but
>>> the #DB prologue has a clever fixup for traps on the very first
>>> instruction of entry_SYSENTER_32, and the fixup doesn't work quite
>>> correctly.  The next two patches will fix that.
>>>
>>> [1] We could probably prevent it by forcing BTF on at all times and
>>>     making sure we clear TF before any branches in the SYSENTER
>>>     code.  Needless to say, this is a bad idea.
>>>
>>> Signed-off-by: Andy Lutomirski <luto@...nel.org>
>>> ---
>>>  arch/x86/entry/entry_32.S        | 42 ++++++++++++++++++++++----------
>>>  arch/x86/entry/entry_64_compat.S |  9 ++++++-
>>>  arch/x86/include/asm/proto.h     | 15 ++++++++++--
>>>  arch/x86/kernel/traps.c          | 52 +++++++++++++++++++++++++++++++++-------
>>>  4 files changed, 94 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>>> index a8c3424c3392..7700cf695d8c 100644
>>> --- a/arch/x86/entry/entry_32.S
>>> +++ b/arch/x86/entry/entry_32.S
>>> @@ -287,7 +287,26 @@ need_resched:
>>>  END(resume_kernel)
>>>  #endif
>>>
>>> -     # SYSENTER  call handler stub
>>> +GLOBAL(__begin_SYSENTER_singlestep_region)
>>> +/*
>>> + * All code from here through __end_SYSENTER_singlestep_region is subject
>>> + * to being single-stepped if a user program sets TF and executes SYSENTER.
>>> + * There is absolutely nothing that we can do to prevent this from happening
>>> + * (thanks Intel!).  To keep our handling of this situation as simple as
>>> + * possible, we handle TF just like AC and NT, except that our #DB handler
>>> + * will ignore all of the single-step traps generated in this range.
>>> + */
>>> +
>>> +#ifdef CONFIG_XEN
>>> +/*
>>> + * Xen doesn't set %esp to be precisely what the normal SYSENTER
>>> + * entry point expects, so fix it up before using the normal path.
>>> + */
>>> +ENTRY(xen_sysenter_target)
>>> +     addl    $5*4, %esp                      /* remove xen-provided frame */
>>> +     jmp     sysenter_past_esp
>>> +#endif
>>> +
>>>  ENTRY(entry_SYSENTER_32)
>>>       movl    TSS_sysenter_sp0(%esp), %esp
>>>  sysenter_past_esp:
>>
>> Can you do this below (ontop of your diff) and get rid of those
>> __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat
>> globals and use the entry_SYSENTER_{32|compat} symbols instead?
>>
>> From a quick scan, kallsyms can give you the symbol size too so that you
>> can compute where it ends:
>>
>> readelf -a vmlinux | grep entry_SYSENTER
>>  55454: ffffffff8170aff0    99 FUNC    GLOBAL DEFAULT    1 entry_SYSENTER_compat
>>  62596: ffffffff8170b053     0 NOTYPE  GLOBAL DEFAULT    1 __end_entry_SYSENTER_comp
>>
>> 0xffffffff8170aff0 + 99 = 0xffffffff8170b053
>>
>> ---
>> Index: b/arch/x86/entry/entry_32.S
>> ===================================================================
>> --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100
>> +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100
>> @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio
>>   * will ignore all of the single-step traps generated in this range.
>>   */
>>
>> +ENTRY(entry_SYSENTER_32)
>>  #ifdef CONFIG_XEN
>> -/*
>> - * Xen doesn't set %esp to be precisely what the normal SYSENTER
>> - * entry point expects, so fix it up before using the normal path.
>> - */
>> -ENTRY(xen_sysenter_target)
>>         addl    $5*4, %esp                      /* remove xen-provided frame */
>>         jmp     sysenter_past_esp
>> -#endif
>> -
>> -ENTRY(entry_SYSENTER_32)
>> +#else
>>         movl    TSS_sysenter_sp0(%esp), %esp
>> +#endif
>>  sysenter_past_esp:
>>         pushl   $__USER_DS              /* pt_regs->ss */
>>         pushl   %ebp                    /* pt_regs->sp (stashed in bp) */
>
>
> This won't work if you run a Xen enabled kernel on bare metal.  This
> would work though:
>
> ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp",
> X86_FEATURE_XENPV

That might work.

>
> I haven't read the Xen hypervisor code, but what are those 5 words
> that were pushed on the stack by the hypervisor?  It suspiciously is
> the size of an IRET frame.

I think it is nominally an IRET frame.  One might wonder what's in it,
given that the hypervisor doesn't know what the old IP or SP was.

>  Considering that we don't use SYSEXIT on
> Xen anymore, can we just redirect SYSENTER to the INT80 handler?
> Perhaps even just disable SYSENTER support in the VDSO on Xen.   I
> can't imagine SYSENTER is any faster than INT80 on Xen, because it has
> to trap to the hypervisor first.
>

I think we should leave it enabled -- having user programs on Xen PV
trap into the hypervisor for a system call using SYSENTER will still
be much faster than using INT $0x80 as long as the hypervisor does
something reasonable.

I think that Andrew Cooper has some patches in the works to clean a
bunch of the Xen PV entry prologue stuff up.

--Andy