lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMzpN2hGNHzLNT=HoYGmthwT4BRx+BuM-TaNPQdPvMUXHK7LNw@mail.gmail.com>
Date:	Fri, 27 Mar 2015 16:53:18 -0400
From:	Brian Gerst <brgerst@...il.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Denys Vlasenko <dvlasenk@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Borislav Petkov <bp@...en8.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>, Ingo Molnar <mingo@...nel.org>
Subject: Re: ia32_sysenter_target does not preserve EFLAGS

On Fri, Mar 27, 2015 at 2:37 PM, Andy Lutomirski <luto@...capital.net> wrote:
> On Mar 27, 2015 7:26 AM, "Denys Vlasenko" <dvlasenk@...hat.com> wrote:
>>
>> Hi,
>>
>> While running some tests I noticed that EFLAGS
>> is not saved across syscalls if I use 32-bit
>> userspace, use SYSENTER, and paravirt is active.
>>
>> Looking at the code, it's actually clear why that happens.
>>
>> /*
>>  * SYSENTER loads ss, rsp, cs, and rip from previously programmed MSRs.
>>  * IF and VM in rflags are cleared (IOW: interrupts are off).
>>  * SYSENTER does not save anything on the stack,
>>  * and does not save old rip (!!!) and rflags.
>>  */
>> ENTRY(ia32_sysenter_target)
>>         SWAPGS_UNSAFE_STACK  <============================
>>         movq    PER_CPU_VAR(cpu_tss + TSS_sp0), %rsp
>>         ENABLE_INTERRUPTS(CLBR_NONE)
>>
>>         movl    %ebp, %ebp
>>         movl    %eax, %eax
>>         movl    ASM_THREAD_INFO(TI_sysenter_return, %rsp, 0), %r10d
>>
>>         /* Construct struct pt_regs on stack */
>>         pushq_cfi       $__USER32_DS            /* pt_regs->ss */
>>         pushq_cfi       %rbp                    /* pt_regs->sp */
>>         CFI_REL_OFFSET  rsp,0
>>         pushfq_cfi                              /* pt_regs->flags */
>>
>> The SWAPGS_UNSAFE_STACK, it's it involves paravirt callbacks,
>> will change EFLAGS, and it *can't* save/restore them -
>> there is no place to save it, since neither stack nor
>> PER_CPU() is usable at that point.
>>
>> Interestingly, *no one ever complained*!
>>
>> Apparently, users *don't* depend on arithmetic flags
>> to survive over syscall. They also okay with DF flag
>> being cleared.
>>
>> Let's go flag-by-flag.
>>
>> ID - probably no one depends on it
>> VIP,VIF,VM - v86 stuff, not supported in 64bit
>> AC - someone probably do use this
>> RF - should be cleared to 0
>> NT - iret via task gate, not supported in 64bit
>> IOPL - usually 00, sys_iopl() can change it
>> DF - according to C ABI, should be 0
>> IF - should be preserved (but almost always 1)
>> TF - should be preserved
>> arith flags - probably no one cares
>>
>> IOW. Bits to be preseved are only AC, IOPL, TF, and _maybe_
>> IF.
>>
>> AC and IOPL are preserved even with this paravirt quirk
>> because paravirt hooks do not mangle them.
>>
>> TF preservation and proper restoration is handled by
>>         do_debug + syscall_trace_enter_phase2 + iret
>> combo.
>>
>> We unconditionally set IF. This is only a problem for applications
>> which use sys_iopl(3) and, disable IRQs in userspace and perform
>> syscalls. The set of such apps is probably empty.
>> (This "bug" exists even for non-paravirt case).
>>
>> So, formally, we have a bug: we do not preserve IF,
>> DF and arith flags.
>>
>> I'm proposing to use this opportunity to amend syscall ABI
>> to say that arith flags are not preserved across syscalls,
>> and DF can be cleared to 0 by syscalls (but can't be set to 1).
>> Evidently, it's broken for some time for some virtualized
>> setups and users are okay.
>
> I think I'd rather fix it.  I want to give x86_64 a sysenter stack
> like x86_32's, since AFAICT the only reason that #DF needs to use IST
> is because sysenter with TF set is the only way I can see that #DF
> could happen with an invalid stack.

What if RSP gets corrupted in the kernel?  That would cause a fault
that gets promoted to #DF, since the iret frame can't be pushed.  You
would at least get an oops out instead of a triple fault reset.

> Also, Houston, we have a bug, probably rootable, and probably damn
> near impossible to exploit without crashing your system:
>
> User does sysenter.  We end up in native_irq_enable_sysexit.  We do:
>
> swapgs
> sti
>
> <-- NMI here can happen on some (all?) cpus, returns successfully
> *with interrupts unmasked*
>
> <-- IRQ.  Boom

The sti will delay interrupts for one instruction, and that should include NMIs.

The Intel SDM states for STI:
"The IF flag and the STI and CLI instructions do not prohibit the
generation of exceptions and NMI interrupts. NMI
interrupts (and SMIs) may be blocked for one macroinstruction following an STI."

> My preferred fix would be to use sysretl instead of sysexit.  As far
> as I know, there are no 64-bit CPUs at all that don't support sysretl.

It would be nice to have one less return path.  I'm curious to know
why Intel doesn't support syscall from compatibility mode but does
support sysret to compatibility mode.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ