linux-kernel - Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <53E8B5EF.9080209@redhat.com>
Date:	Mon, 11 Aug 2014 14:24:15 +0200
From:	Denys Vlasenko <dvlasenk@...hat.com>
To:	Andy Lutomirski <luto@...capital.net>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	X86 ML <x86@...nel.org>, Alexei Starovoitov <ast@...mgrid.com>,
	Will Drewry <wad@...omium.org>,
	Kees Cook <keescook@...omium.org>
Subject: Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64
 fastpath

On 08/11/2014 12:42 AM, Andy Lutomirski wrote:
> On Mon, Aug 11, 2014 at 12:00 AM, Denys Vlasenko <dvlasenk@...hat.com> wrote:
>> On 08/09/2014 12:59 AM, Andy Lutomirski wrote:
>>>> + * When returning through fast path, userspace sees rcx = return address,
>>>> + * r11 = rflags. When returning through iret (e.g. if audit is active),
>>>> + * these registers may contain garbage.
>>>> + * For ptrace we manage to avoid that: when we hit slow path on entry,
>>>> + * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
>>>> + * If slow path is entered only on exit, there will be garbage.
>>>
>>> I don't like this.  At least the current code puts something
>>> deterministic in there (-1) for the slow path, even though it's wrong
>>> and makes the slow path behave visibly differently from the fast path.
>>>
>>> Leaking uninitialized data here is extra bad, though.  Keep in mind
>>> that, when a syscall entry is interrupted before fully setting up
>>> pt_regs, the interrupt frame overlaps task_pt_regs, so it's possible,
>>> depending on the stack slot ordering, for a kernel secret
>>> (kernel_stack?) to end up somewhere in task_pt_regs.
>>
>> It's easy to fix. We jump off fast path to slow path here:
>>
>>         movl TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS),%edx
>>         andl %edi,%edx
>>         jnz  sysret_careful
>>
>> This is the only use of "sysret_careful" label.
>> Therefore, there we don't need to think about any other scenarios
>> besides "we are returning from syscall here".
> 
> I may be missing something here (on vacation, not really testing
> things, no big monitor, etc), but how is this compatible with things
> like rt_sigreturn?  rt_sigreturn is call from the fastpath, via a
> stub, and it returns through int_ret_from_syscall.  The C part needs
> to modify all the regs, and those regs need to stick, so fixing up rcx
> and r11 after rt_sigreturn can't work.

Code at "sysret_careful" label is only reachable
on fast path return.
We don't go down this code path after rt_sigreturn.
stub_rt_sigreturn manually steers into iret code path instead:

ENTRY(stub_rt_sigreturn)
        CFI_STARTPROC
        addq $8, %rsp
        DEFAULT_FRAME 0
        SAVE_EXTRA_REGS
        STORE_IRET_FRAME_CS_SS
        call sys_rt_sigreturn
        movq %rax,RAX(%rsp)
        RESTORE_EXTRA_REGS
        jmp int_ret_from_sys_call   <==== NOTE THIS

So, we don't do any rcx/r11 fixups after sys_rt_sigreturn.

BTW, looks like "SAVE_EXTRA_REGS" here is not necessary:
sys_rt_sigreturn overwrites all registers anyway.
I have its removal on my TODO list.
(It was there before my patch set because stack needed
*resizing to full pt_regs size*; now it does not need that).






> 
> On the flip side, fork probably needs to *read* those regs, so the
> fixup needs to happen before fork.
> 
> I'm sure it's possible to get this right, but it's complicated and
> error-prone, and I'm not at all convinced it's worth it to save two
> stores on the fast path.
> 
> (I'm pretty sure that current kernels have all kinds of bugs in this area.)
> 
>>
>> My patch only fixes up segment regs:
>>
>>         /* Handle reschedules */
>>         /* edx: work, edi: workmask */
>>  sysret_careful:
>> +       STORE_IRET_FRAME_CS_SS
>>         bt $TIF_NEED_RESCHED,%edx
>>
>> pt_regs->rcx and ->r11 aren't initialized.
>>
>> We can fix that e.g. by adding here four more insns:
>>
>>         movq    RIP(%rsp),%rcx
>>         movq    %rcx,RCX(%rsp)
>>         movq    EFLAGS(%rsp),%rcx
>>         movq    %rcx,R11(%rsp)
>>
>> similar to what my patch did on slow path on entry (see changes at
>> "tracesys" label).
>>
>> Or we can just bite the bullet and save rcx and r11 on fast path
>> (there it will require just two insns). +2 cycles on most CPUs.
>>
> 
> 
> --Andy
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/