lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 10 Mar 2015 06:26:47 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Denys Vlasenko <dvlasenk@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Borislav Petkov <bp@...en8.de>,
	"H. Peter Anvin" <hpa@...or.com>, Oleg Nesterov <oleg@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Will Drewry <wad@...omium.org>,
	Kees Cook <keescook@...omium.org>, X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/4] x86: save user rsp in pt_regs->sp on SYSCALL64 fastpath

On Tue, Mar 10, 2015 at 6:21 AM, Ingo Molnar <mingo@...nel.org> wrote:
>
> * Denys Vlasenko <dvlasenk@...hat.com> wrote:
>
>> > So there are now +2 instructions (5 instead of 3) in the
>> > system_call path, but there are -2 instructions in the SYSRETQ
>> > path,
>>
>> Unfortunately, no. [...]
>
> So I assumed that it was an equivalent transformation, given that none
> of the changelogs spelled out the increase in overhead ...
>
>> [...] There is only this change in SYSRETQ path, which simply
>> changes where we get RSP from:
>>
>> @@ -293,7 +289,7 @@ ret_from_sys_call:
>>       CFI_REGISTER    rip,rcx
>>       movq    EFLAGS(%rsp),%r11
>>       /*CFI_REGISTER  rflags,r11*/
>> -     movq    PER_CPU_VAR(old_rsp), %rsp
>> +     movq    RSP(%rsp),%rsp
>>       /*
>>        * 64bit SYSRET restores rip from rcx,
>>        * rflags from r11 (but RF and VM bits are forced to 0),
>>
>> Most likely, no change in execution speed here.
>> At best, it is one cycle faster somewhere in address generation unit
>> because for PER_CPU_VAR() address evaluation, GS base is nonzero.
>>
>> Since this patch does add two extra MOVs,
>> I did benchmark these patches. They add exactly one cycle
>> to system call code path on my Sandy Bridge CPU.
>
> Hm, but that's the wrong direction, we should try to make it faster,
> and to clean it up - but making it slower without really good reasons
> isn't good.
>
> Is 'usersp' really that much of a complication?

usersp is IMO tolerable.  The nasty thing is the FIXUP_TOP_OF_STACK /
RESTORE_TOP_OF_STACK garbage, and this patch is the main step toward
killing that off completely.  I've still never convinced myself that
there aren't ptrace-related info leaks in there.

Denys, did you ever benchmark what happens if we use push instead of
mov?  I bet that we get that cycle back and more, not to mention much
less icache usage.

The reason I think that is that this patch changes us from writing
nothing to the RIP slot to writing something there.  If we push our
stack frame instead of moving it into place, we must write to every
slot, and writing the right value is probably no more expensive than
writing, say, zero (the load / forwarding units are unlikely to be
very busy at that point).  So this change could actually be a step
toward getting faster.

--Andy

>
> Thanks,
>
>         Ingo



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ