linux-kernel - Re: [RFC PATCH v2 0/4] dynamic indirect call promotion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <77B1C9B7-2A61-4A55-96D0-6E21629956DB@vmware.com>
Date:   Mon, 18 Feb 2019 16:22:11 +0000
From:   Nadav Amit <namit@...are.com>
To:     Edward Cree <ecree@...arflare.com>
CC:     Josh Poimboeuf <jpoimboe@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH v2 0/4] dynamic indirect call promotion

> On Feb 15, 2019, at 9:21 AM, Edward Cree <ecree@...arflare.com> wrote:
> 
> On 05/02/19 08:50, Nadav Amit wrote:
>>> In cases where RCU cannot be used (e.g. because some callees need to RCU
>>> synchronise), it might be possible to add a variant that uses
>>> synchronize_rcu_tasks() when updating, but this series does not attempt this.
>> I wonder why.
> Mainly because I have yet to convince myself that it's the Right Thing.
> Note also the following (from kernel/rcu/update.c):
> 
> /* * This is a very specialized primitive, intended only for a few uses in
> * tracing and other situations requiring manipulation of function
> * preambles and profiling hooks. The synchronize_rcu_tasks() function
> * is not (yet) intended for heavy use from multiple CPUs.  */
> 
>> This seems like an easy solution, and according to Josh, Steven
>> Rostedt and the documentation appears to be valid.
> Will it hurt performance, though, if we end up (say) having rcu-tasks-
>  based synchronisation for updates on every indirect call in the kernel?
> (As would result from a plugin-based opt-out approach.)

That’s what batching is for..

> 
>> As I stated before, I think that the best solution is to use a GCC plugin,
>> [...] Such a solution will not enable the calling code to be
>> written in C and would require a plugin for each architecture.
> I'm afraid I don't see why.  If we use the static_calls infrastructure,
>  but then do a source-level transformation in the compiler plugin to turn
>  indirect calls into dynamic_calls, it should be possible to create an
>  opt-out system without any arch-specific code in the plugin (the arch-
>  specific stuff being all in the static_calls code).
> Any reason that can't be done?  (Note: I don't know much about GCC
>  internals, maybe there's something obvious that stops a plugin doing
>  things like that.)

Hmm… I think you are right. It may be possible by hooking into
PLUGIN_START_PARSE_FUNCTION or PLUGIN_FINISH_PARSE_FUNCTION event. But, I
think source code manipulation is likely to be more error prone and “dirty”.
I think that assembly is the right level to deal with indirect calls anyhow,
specifically if the same mechanism is used for callee-saved functions.

>> Feel free to try my code and give me feedback. I did not get a feedback on my
>> last version. Is there a fundamental problem with my plugin? Did you try it 
>> and got bad results, or perhaps it did not build?
> I didn't test your patches yet, because I was busy trying to get mine
>  working and ready to post (and also with unrelated work).  But now that
>  that's done, next time I have cycles spare for indirect call stuff I
>  guess testing (and reviewing) your approach will be next on my list.
> 
>> Why do you prefer an approach
>> which requires annotation of the callers, instead of something that is much
>> more transparent?
> I'm concerned about the overhead (in both time and memory) of running
>  learning on every indirect call site (including ones that aren't in a
>  hot-path, and ones which have such a wide variety of callees that
>  promotion really doesn't help) throughout the whole kernel.  Also, an
>  annotating programmer knows the locking/rcu context and can thus tell
>  whether a given dynamic_call should use synchronise_rcu_tasks(),
>  synchronise_rcu(), or perhaps something else (if e.g. the call always
>  happens under a mutex, then the updater work could take that mutex).
> 
> The real answer, though, is that I don't so much prefer this approach,
>  as think that both should be tried "publicly" and evaluated by more
>  developers than just us three.  There's a reason this series is
>  marked RFC ;-)

Reading my email from ~2 weeks ago - I realize I don’t really understand my
own question. Clearly, annotation is better (if possible). My point was
mainly that it is a tedious job to annotate all the locations, and there are
quite a few.