linux-kernel - Re: [x86.git#mm] stack protector fixes, vmsplice exploit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080214231640.GA31883@elte.hu>
Date:	Fri, 15 Feb 2008 00:16:41 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Jakub Jelinek <jakub@...hat.com>
Cc:	pageexec@...email.hu, Sam Ravnborg <sam@...nborg.org>,
	Arjan van de Ven <arjan@...radead.org>,
	linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [x86.git#mm] stack protector fixes, vmsplice exploit


* Jakub Jelinek <jakub@...hat.com> wrote:

> On Thu, Feb 14, 2008 at 09:25:35PM +0100, Ingo Molnar wrote:
> > The per function call overhead from stackprotector is already pretty 
> > serious IMO, but at least that's something that GCC _could_ be doing 
> > (much) smarter (why doesnt it jne forward out to __check_stk_failure, 
> > instead of generating 4 instructions, one of them a default-mispredicted 
> > branch instruction??), so that overhead could in theory be something 
> > like 4 fall-through instructions per function, instead of the current 6.
> 
> Where do you see a mispredicted branch?

ah!

> int foo (void)
> {
>   char buf[64];
>   bar (buf);
>   return 6;
> }
> 
> -O2 -fstack-protector -m64:
>         subq    $88, %rsp
>         movq    %fs:40, %rax
>         movq    %rax, 72(%rsp)
>         xorl    %eax, %eax
>         movq    %rsp, %rdi
>         call    bar
>         movq    72(%rsp), %rdx
>         xorq    %fs:40, %rdx
>         movl    $6, %eax
>         jne     .L5
>         addq    $88, %rsp
>         ret
> .L5:
>         .p2align 4,,6
>         .p2align 3
>         call    __stack_chk_fail

i got this:

	.file	""
	.text
.globl foo
	.type	foo, @function
foo:
.LFB2:
	pushq	%rbp
.LCFI0:
	movq	%rsp, %rbp
.LCFI1:
	subq	$208, %rsp
.LCFI2:
	movq	__stack_chk_guard(%rip), %rax
	movq	%rax, -8(%rbp)
	xorl	%eax, %eax
	movl	$3, %eax
	movq	-8(%rbp), %rdx
	xorq	__stack_chk_guard(%rip), %rdx
	je	.L3
	call	__stack_chk_fail
.L3:
	leave
	ret

but that's F8's gcc 4.1, and not the kernel mode code generator either. 

the code you cited looks far better - that's good news!

one optimization would be to do a 'jne' straight into __stack_chk_fail() 
- it's not like we ever want to return. [and it's obvious from the 
existing stackframe which one the failing function was] That way we'd 
have about 3 bytes less per function? We dont want to return to the 
original function so for the kernel it would be OK.

another potential optimization would be to exchange this:

>         subq    $88, %rsp
>         movq    %fs:40, %rax
>         movq    %rax, 72(%rsp)

into:

	pushq	%fs:40
	subq	$80, %rsp

or am i missing something? (is there perhaps an address generation 
dependency between the pushq and the subq? Or the canary would be at the 
wrong position?)

> both with gcc 4.1.x and 4.3.0. BTW, you can use -fstack-protector 
> --param=ssp-buffer-size=4 etc. to tweak the size of buffers to trigger 
> stack protection, the default is 8, but e.g. whole Fedora is compiled 
> with 4.

ok. is -fstack-protector-all basically equivalent to 
--param=ssp-buffer-size=0 ? I'm wondering whether it would be easy for 
gcc to completely skip stackprotector code on functions that have no 
buffers, even under -fstack-protector-all. (perhaps it already does?)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/