linux-kernel - Re: [PATCH] x86/asm/entry/64: better check for canonical address

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <551ADA0A.7050701@redhat.com>
Date:	Tue, 31 Mar 2015 19:31:54 +0200
From:	Denys Vlasenko <dvlasenk@...hat.com>
To:	Andy Lutomirski <luto@...capital.net>,
	Ingo Molnar <mingo@...nel.org>
CC:	Denys Vlasenko <vda.linux@...glemail.com>,
	Brian Gerst <brgerst@...il.com>,
	Borislav Petkov <bp@...en8.de>,
	the arch/x86 maintainers <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] x86/asm/entry/64: better check for canonical address

On 03/31/2015 07:08 PM, Andy Lutomirski wrote:
> On Tue, Mar 31, 2015 at 9:43 AM, Ingo Molnar <mingo@...nel.org> wrote:
>>
>> * Denys Vlasenko <vda.linux@...glemail.com> wrote:
>>
>>>> I guess they could optimize it by adding a single "I am a modern
>>>> OS executing regular userspace" flag to the descriptor [or
>>>> expressing the same as a separate instruction], to avoid all that
>>>> legacy crap that won't trigger on like 99.999999% of systems ...
>>>
>>> Yes, that would be a useful addition. Interrupt servicing on x86
>>> takes a non-negligible hit because of IRET slowness.
>>
>> But ... to react to your other patch: detecting the common easy case
>> and doing a POPF+RET ourselves ought to be pretty good as well?
>>
>> But only if ptregs->rip != the magic RET itself, to avoid recursion.
>>
>> Even with all those extra checks it should still be much faster.
>>
> 
> I have a smallish preference for doing sti;ret instead, because that
> keeps the funny special case entirely localized to the NMI code
> instead of putting it in the IRQ exit path.  I suspect that the
> performance loss is at most a cycle or two (we're adding a branch, but
> sti itself is quite fast).
> 
> That being said, I could easily be convinced otherwise.

Let me try to convince you. sti is 6 cycles.

The patch atop your code would be:

 	movq RIP-ARGOFFSET(%rsp), %rcx
+	cmp $magic_ret, %rcx
+	je  real_iret
-	btr $9, %rdi
 	movq %rdi, (%rsi)
 	movq %rcx, 8(%rsi)
 	movq %rsi, ORIG_RAX-ARGOFFSET(%rsp)
 	popq_cfi %r11
 	popq_cfi %r10
 	popq_cfi %r9
 	popq_cfi %r8
 	popq_cfi %rax
 	popq_cfi %rcx
 	popq_cfi %rdx
 	popq_cfi %rsi
 	popq_cfi %rdi
 	popq %rsp
-	jc 1f
	popfq_cfi
+magic_ret:
	retq
-1:
-	popfq_cfi
-	sti
-	retq

It's a clear (albeit small) win: the branch is almost never taken,
and we do not need sti.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/