[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXhUdMNfC0yBah39Z=xWVe3Fix-qF922gJjqzTO1B7TSA@mail.gmail.com>
Date: Fri, 10 Nov 2017 06:57:55 -0800
From: Andy Lutomirski <luto@...capital.net>
To: "Hector Martin 'marcan'" <marcan@...can.st>
Cc: LKML <linux-kernel@...r.kernel.org>,
"kernel-hardening@...ts.openwall.com"
<kernel-hardening@...ts.openwall.com>, X86 ML <x86@...nel.org>
Subject: Re: vDSO maximum stack usage, stack probes, and -fstack-check
On Fri, Nov 10, 2017 at 2:40 AM, Hector Martin 'marcan'
<marcan@...can.st> wrote:
> As far as I know, the vDSO specs (both Documentation/ABI/stable/vdso and
> `man 7 vdso`) make no mention of how much stack the vDSO functions are
> allowed to use. They just say "the usual C ABI", which makes no guarantees.
>
> It turns out that Go has been assuming that those functions use less
> than 104 bytes of stack space, because it calls them directly on its
> tiny stack allocations with no guard pages or other hardware overflow
> protection [1]. On most systems, this is fine.
>
> However, on my system the stars aligned and turned it into a
> nondeterministic crash. I use Gentoo Hardened, which builds its
> toolchain with -fstack-check on by default. It turns out that with the
> combination of GCC 6.4.0, -fstack-protect, linux-4.13.9-gentoo, and
> CONFIG_OPTIMIZE_INLINING=n, gcc decides to *not* inline vread_tsc (it's
> not marked inline, so it's perfectly within its right not to do that,
> though for some reason it does inline when CONFIG_OPTIMIZE_INLINING=y
> even though that nominally gives it greater freedom *not* to inline
> things marked inline). That turns __vdso_clock_gettime and
> __vdso_gettimeofday into non-leaf functions, and GCC then inserts a
> stack probe (full objdump at [2]):
>
> 0000000000000030 <__vdso_clock_gettime>:
> 30: 55 push %rbp
> 31: 48 89 e5 mov %rsp,%rbp
> 34: 48 81 ec 20 10 00 00 sub $0x1020,%rsp
> 3b: 48 83 0c 24 00 orq $0x0,(%rsp)
> 40: 48 81 c4 20 10 00 00 add $0x1020,%rsp
This code is so wrong I don't even no where to start. Seriously, sub,
orq, add? How about just orq with an offset? How about a *load*
instead of a store?
But stepping back even further, an offset > 4096 is just bogus.
That's big enough to skip right over the guard page.
Anyway, my recollection is that GCC's stack check code is busted until
much newer gcc versions. I suppose we could try to make the kernel
fail to build at all on a broken configuration like this.
--Andy
Powered by blists - more mailing lists