[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVTw5amX-v_C+Bp5Y0B0KbzNKEEq_ZMke-AC7cZopGd9Q@mail.gmail.com>
Date: Mon, 11 Aug 2014 07:42:07 +0900
From: Andy Lutomirski <luto@...capital.net>
To: Denys Vlasenko <dvlasenk@...hat.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Oleg Nesterov <oleg@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Frederic Weisbecker <fweisbec@...il.com>,
X86 ML <x86@...nel.org>, Alexei Starovoitov <ast@...mgrid.com>,
Will Drewry <wad@...omium.org>,
Kees Cook <keescook@...omium.org>
Subject: Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath
On Mon, Aug 11, 2014 at 12:00 AM, Denys Vlasenko <dvlasenk@...hat.com> wrote:
> On 08/09/2014 12:59 AM, Andy Lutomirski wrote:
>>> + * When returning through fast path, userspace sees rcx = return address,
>>> + * r11 = rflags. When returning through iret (e.g. if audit is active),
>>> + * these registers may contain garbage.
>>> + * For ptrace we manage to avoid that: when we hit slow path on entry,
>>> + * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
>>> + * If slow path is entered only on exit, there will be garbage.
>>
>> I don't like this. At least the current code puts something
>> deterministic in there (-1) for the slow path, even though it's wrong
>> and makes the slow path behave visibly differently from the fast path.
>>
>> Leaking uninitialized data here is extra bad, though. Keep in mind
>> that, when a syscall entry is interrupted before fully setting up
>> pt_regs, the interrupt frame overlaps task_pt_regs, so it's possible,
>> depending on the stack slot ordering, for a kernel secret
>> (kernel_stack?) to end up somewhere in task_pt_regs.
>
> It's easy to fix. We jump off fast path to slow path here:
>
> movl TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS),%edx
> andl %edi,%edx
> jnz sysret_careful
>
> This is the only use of "sysret_careful" label.
> Therefore, there we don't need to think about any other scenarios
> besides "we are returning from syscall here".
I may be missing something here (on vacation, not really testing
things, no big monitor, etc), but how is this compatible with things
like rt_sigreturn? rt_sigreturn is call from the fastpath, via a
stub, and it returns through int_ret_from_syscall. The C part needs
to modify all the regs, and those regs need to stick, so fixing up rcx
and r11 after rt_sigreturn can't work.
On the flip side, fork probably needs to *read* those regs, so the
fixup needs to happen before fork.
I'm sure it's possible to get this right, but it's complicated and
error-prone, and I'm not at all convinced it's worth it to save two
stores on the fast path.
(I'm pretty sure that current kernels have all kinds of bugs in this area.)
>
> My patch only fixes up segment regs:
>
> /* Handle reschedules */
> /* edx: work, edi: workmask */
> sysret_careful:
> + STORE_IRET_FRAME_CS_SS
> bt $TIF_NEED_RESCHED,%edx
>
> pt_regs->rcx and ->r11 aren't initialized.
>
> We can fix that e.g. by adding here four more insns:
>
> movq RIP(%rsp),%rcx
> movq %rcx,RCX(%rsp)
> movq EFLAGS(%rsp),%rcx
> movq %rcx,R11(%rsp)
>
> similar to what my patch did on slow path on entry (see changes at
> "tracesys" label).
>
> Or we can just bite the bullet and save rcx and r11 on fast path
> (there it will require just two insns). +2 cycles on most CPUs.
>
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists