lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 11 Aug 2014 07:42:07 +0900
From:	Andy Lutomirski <>
To:	Denys Vlasenko <>
Cc:	"" <>,
	Linus Torvalds <>,
	Oleg Nesterov <>,
	"H. Peter Anvin" <>,
	Frederic Weisbecker <>,
	X86 ML <>, Alexei Starovoitov <>,
	Will Drewry <>,
	Kees Cook <>
Subject: Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath

On Mon, Aug 11, 2014 at 12:00 AM, Denys Vlasenko <> wrote:
> On 08/09/2014 12:59 AM, Andy Lutomirski wrote:
>>> + * When returning through fast path, userspace sees rcx = return address,
>>> + * r11 = rflags. When returning through iret (e.g. if audit is active),
>>> + * these registers may contain garbage.
>>> + * For ptrace we manage to avoid that: when we hit slow path on entry,
>>> + * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
>>> + * If slow path is entered only on exit, there will be garbage.
>> I don't like this.  At least the current code puts something
>> deterministic in there (-1) for the slow path, even though it's wrong
>> and makes the slow path behave visibly differently from the fast path.
>> Leaking uninitialized data here is extra bad, though.  Keep in mind
>> that, when a syscall entry is interrupted before fully setting up
>> pt_regs, the interrupt frame overlaps task_pt_regs, so it's possible,
>> depending on the stack slot ordering, for a kernel secret
>> (kernel_stack?) to end up somewhere in task_pt_regs.
> It's easy to fix. We jump off fast path to slow path here:
>         movl TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS),%edx
>         andl %edi,%edx
>         jnz  sysret_careful
> This is the only use of "sysret_careful" label.
> Therefore, there we don't need to think about any other scenarios
> besides "we are returning from syscall here".

I may be missing something here (on vacation, not really testing
things, no big monitor, etc), but how is this compatible with things
like rt_sigreturn?  rt_sigreturn is call from the fastpath, via a
stub, and it returns through int_ret_from_syscall.  The C part needs
to modify all the regs, and those regs need to stick, so fixing up rcx
and r11 after rt_sigreturn can't work.

On the flip side, fork probably needs to *read* those regs, so the
fixup needs to happen before fork.

I'm sure it's possible to get this right, but it's complicated and
error-prone, and I'm not at all convinced it's worth it to save two
stores on the fast path.

(I'm pretty sure that current kernels have all kinds of bugs in this area.)

> My patch only fixes up segment regs:
>         /* Handle reschedules */
>         /* edx: work, edi: workmask */
>  sysret_careful:
>         bt $TIF_NEED_RESCHED,%edx
> pt_regs->rcx and ->r11 aren't initialized.
> We can fix that e.g. by adding here four more insns:
>         movq    RIP(%rsp),%rcx
>         movq    %rcx,RCX(%rsp)
>         movq    EFLAGS(%rsp),%rcx
>         movq    %rcx,R11(%rsp)
> similar to what my patch did on slow path on entry (see changes at
> "tracesys" label).
> Or we can just bite the bullet and save rcx and r11 on fast path
> (there it will require just two insns). +2 cycles on most CPUs.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists