linux-kernel - Re: [RFC] security: replace indirect calls with static calls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210205150926.GA12608@localhost>
Date:   Fri, 5 Feb 2021 10:09:26 -0500
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Brendan Jackman <jackmanb@...omium.org>
Cc:     linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
        linux-security-module@...r.kernel.org,
        Paul Renauld <renauld@...gle.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        James Morris <jmorris@...ei.org>, pjt@...gle.com,
        jannh@...gle.com, peterz@...radead.org, rafael.j.wysocki@...el.com,
        keescook@...omium.org, thgarnie@...omium.org, kpsingh@...gle.com,
        paul.renauld.epfl@...il.com, Brendan Jackman <jackmanb@...gle.com>,
        mathieu.desnoyers@...icios.com, rostedt@...dmis.org
Subject: Re: [RFC] security: replace indirect calls with static calls

On 20-Aug-2020 06:47:53 PM, Brendan Jackman wrote:
> From: Paul Renauld <renauld@...gle.com>
> 
> LSMs have high overhead due to indirect function calls through
> retpolines. This RPC proposes to replace these with static calls [1]
> instead.
> 
> This overhead is especially significant for the "bpf" LSM which supports
> the implementation of LSM hooks with eBPF programs (security/bpf)[2]. In
> order to facilitate this, the "bpf" LSM provides a default nop callback for
> all LSM hooks. When enabled, the "bpf", LSM incurs an unnecessary /
> avoidable indirect call to this nop callback.
> 
> The performance impact on a simple syscall eventfd_write (which triggers
> the file_permission hook) was measured with and without "bpf" LSM
> enabled. Activating the LSM resulted in an overhead of 4% [3].
> 
> This overhead prevents the adoption of bpf LSM on performance critical
> systems, and also, in general, slows down all LSMs.
> 
> Currently, the LSM hook callbacks are stored in a linked list and
> dispatched as indirect calls. Using static calls can remove this overhead
> by replacing all indirect calls with direct calls.
> 
> During the discussion of the "bpf" LSM patch-set it was proposed to special
> case BPF LSM to avoid the overhead by using static keys. This was however
> not accepted and it was decided to [4]:
> 
> - Not special-case the "bpf" LSM.
> - Implement a general solution benefitting the whole LSM framework.
> 
> This is based on the static call branch [5].

Hi!

So I reviewed this quickly, and hopefully my understanding is correct.
AFAIU, your approach is limited to scenarios where the callbacks are
known at compile-time. It also appears to add the overhead of a
switch/case for every function call on the fast-path.

I am the original author of the tracepoint infrastructure in the Linux
kernel, which also needs to iterate on an array of callbacks. Recently,
Steven Rostedt pushed a change which accelerates the single-callback
case using static calls to reduce retpoline mitigation overhead, but I
would prefer if we could accelerate the multiple-callback case as well.
Note that for tracepoints, the callbacks are not known at compile-time.

This is where I think we could come up with a generic solution that
would fit both LSM and tracepoint use-cases.

Here is what I have in mind. Let's say we generate code to accelerate up
to N calls, and after that we have a fallback using indirect calls.

Then we should be able to generate the following using static keys as a
jump table and N static calls:

  jump <static key label target>
label_N:
  stack setup
  call
label_N-1:
  stack setup
  call
label_N-2:
  stack setup
  call
  ...
label_0:
  jump end
label_fallback:
  <iteration and indirect calls>
end:

So the static keys would be used to jump to the appropriate label (using
a static branch, which has pretty much 0 overhead). Static calls would
be used to implement each of the calls.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com