[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080214231640.GA31883@elte.hu>
Date: Fri, 15 Feb 2008 00:16:41 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Jakub Jelinek <jakub@...hat.com>
Cc: pageexec@...email.hu, Sam Ravnborg <sam@...nborg.org>,
Arjan van de Ven <arjan@...radead.org>,
linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [x86.git#mm] stack protector fixes, vmsplice exploit
* Jakub Jelinek <jakub@...hat.com> wrote:
> On Thu, Feb 14, 2008 at 09:25:35PM +0100, Ingo Molnar wrote:
> > The per function call overhead from stackprotector is already pretty
> > serious IMO, but at least that's something that GCC _could_ be doing
> > (much) smarter (why doesnt it jne forward out to __check_stk_failure,
> > instead of generating 4 instructions, one of them a default-mispredicted
> > branch instruction??), so that overhead could in theory be something
> > like 4 fall-through instructions per function, instead of the current 6.
>
> Where do you see a mispredicted branch?
ah!
> int foo (void)
> {
> char buf[64];
> bar (buf);
> return 6;
> }
>
> -O2 -fstack-protector -m64:
> subq $88, %rsp
> movq %fs:40, %rax
> movq %rax, 72(%rsp)
> xorl %eax, %eax
> movq %rsp, %rdi
> call bar
> movq 72(%rsp), %rdx
> xorq %fs:40, %rdx
> movl $6, %eax
> jne .L5
> addq $88, %rsp
> ret
> .L5:
> .p2align 4,,6
> .p2align 3
> call __stack_chk_fail
i got this:
.file ""
.text
.globl foo
.type foo, @function
foo:
.LFB2:
pushq %rbp
.LCFI0:
movq %rsp, %rbp
.LCFI1:
subq $208, %rsp
.LCFI2:
movq __stack_chk_guard(%rip), %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movl $3, %eax
movq -8(%rbp), %rdx
xorq __stack_chk_guard(%rip), %rdx
je .L3
call __stack_chk_fail
.L3:
leave
ret
but that's F8's gcc 4.1, and not the kernel mode code generator either.
the code you cited looks far better - that's good news!
one optimization would be to do a 'jne' straight into __stack_chk_fail()
- it's not like we ever want to return. [and it's obvious from the
existing stackframe which one the failing function was] That way we'd
have about 3 bytes less per function? We dont want to return to the
original function so for the kernel it would be OK.
another potential optimization would be to exchange this:
> subq $88, %rsp
> movq %fs:40, %rax
> movq %rax, 72(%rsp)
into:
pushq %fs:40
subq $80, %rsp
or am i missing something? (is there perhaps an address generation
dependency between the pushq and the subq? Or the canary would be at the
wrong position?)
> both with gcc 4.1.x and 4.3.0. BTW, you can use -fstack-protector
> --param=ssp-buffer-size=4 etc. to tweak the size of buffers to trigger
> stack protection, the default is 8, but e.g. whole Fedora is compiled
> with 4.
ok. is -fstack-protector-all basically equivalent to
--param=ssp-buffer-size=0 ? I'm wondering whether it would be easy for
gcc to completely skip stackprotector code on functions that have no
buffers, even under -fstack-protector-all. (perhaps it already does?)
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists