[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202008201435.97CF8296@keescook>
Date: Thu, 20 Aug 2020 14:45:57 -0700
From: Kees Cook <keescook@...omium.org>
To: Brendan Jackman <jackmanb@...omium.org>
Cc: linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
linux-security-module@...r.kernel.org,
Paul Renauld <renauld@...gle.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
James Morris <jmorris@...ei.org>, pjt@...gle.com,
jannh@...gle.com, peterz@...radead.org, rafael.j.wysocki@...el.com,
thgarnie@...omium.org, kpsingh@...gle.com,
paul.renauld.epfl@...il.com, Brendan Jackman <jackmanb@...gle.com>
Subject: Re: [RFC] security: replace indirect calls with static calls
On Thu, Aug 20, 2020 at 06:47:53PM +0200, Brendan Jackman wrote:
> From: Paul Renauld <renauld@...gle.com>
>
> LSMs have high overhead due to indirect function calls through
> retpolines. This RPC proposes to replace these with static calls [1]
typo: RFC
> instead.
Yay! :)
> [...]
> This overhead prevents the adoption of bpf LSM on performance critical
> systems, and also, in general, slows down all LSMs.
I'd be curious to see other workloads too. (Your measurements are a bit
synthetic, mostly showing "worst case": one short syscall in a tight
loop. I'm curious how much performance gain can be had -- we should
still do it, it'll be a direct performance improvement, but I'm curious
about "real world" impact too.)
> [...]
> Previously, the code for this hook would have looked like this:
>
> ret = DEFAULT_RET;
>
> for each cb in [A, B, C]:
> ret = cb(args); <--- costly indirect call here
> if ret != 0:
> break;
>
> return ret;
>
> Static calls are defined at build time and are initially empty (NOP
> instructions). When the LSMs are initialized, the slots are filled as
> follows:
>
> slot idx content
> |-----------|
> 0 | |
> |-----------|
> 1 | |
> |-----------|
> 2 | call A | <-- base_slot_idx = 2
> |-----------|
> 3 | call B |
> |-----------|
> 4 | call C |
> |-----------|
>
> The generated code will unroll the foreach loop to have a static call for
> each possible LSM:
>
> ret = DEFAULT_RET;
> switch(base_slot_idx):
>
> case 0:
> NOP
> if ret != 0:
> break;
> // fallthrough
> case 1:
> NOP
> if ret != 0:
> break;
> // fallthrough
> case 2:
> ret = A(args); <--- direct call, no retpoline
> if ret != 0:
> break;
> // fallthrough
> case 3:
> ret = B(args); <--- direct call, no retpoline
> if ret != 0:
> break;
> // fallthrough
>
> [...]
>
> default:
> break;
>
> return ret;
>
> A similar logic is applied for void hooks.
>
> Why this trick with a switch statement? The table of static call is defined
> at compile time. The number of hook callbacks that will be defined is
> unknown at that time, and the table cannot be resized at runtime. Static
> calls do not define a conditional execution for a non-void function, so the
> executed slots must be non-empty. With this use of the table and the
> switch, it is possible to jump directly to the first used slot and execute
> all of the slots after. This essentially makes the entry point of the table
> dynamic. Instead, it would also be possible to start from 0 and break after
> the final populated slot, but that would require an additional conditional
> after each slot.
Instead of just "NOP", having the static branches perform a jump would
solve this pretty cleanly, yes? Something like:
ret = DEFAULT_RET;
ret = A(args); <--- direct call, no retpoline
if ret != 0:
goto out;
ret = B(args); <--- direct call, no retpoline
if ret != 0:
goto out;
goto out;
if ret != 0:
goto out;
out:
return ret;
> [...]
> The number of available slots for each LSM hook is currently fixed at
> 11 (the number of LSMs in the kernel). Ideally, it should automatically
> adapt to the number of LSMs compiled into the kernel.
Seems like a reasonable thing to do and could be a separate patch.
> If there’s no practical way to implement such automatic adaptation, an
> option instead would be to remove the panic call by falling-back to the old
> linked-list mechanism, which is still present anyway (see below).
>
> A few special cases of LSM don't use the macro call_[int/void]_hook but
> have their own calling logic. The linked-lists are kept as a possible slow
> path fallback for them.
I assume you mean the integrity subsystem? That just needs to be fixed
correctly. If we switch to this, let's ditch the linked list entirely.
Fixing integrity's stacking can be a separate patch too.
> [...]
> Signed-off-by: Paul Renauld <renauld@...gle.com>
> Signed-off-by: KP Singh <kpsingh@...gle.com>
> Signed-off-by: Brendan Jackman <jackmanb@...gle.com>
This implies a maintainership chain, with Paul as the sole author. If
you mean all of you worked on the patch, include Co-developed-by: as
needed[1].
-Kees
[1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by
--
Kees Cook
Powered by blists - more mailing lists