linux-kernel - Re: [PATCH v2 4/8] ftrace: Add DYNAMIC_FTRACE_WITH_CALL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20230117155203.3a66744e@gandalf.local.home>
Date:   Tue, 17 Jan 2023 15:52:03 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Mark Rutland <mark.rutland@....com>
Cc:     linux-arm-kernel@...ts.infradead.org, catalin.marinas@....com,
        lenb@...nel.org, linux-acpi@...r.kernel.org,
        linux-kernel@...r.kernel.org, mhiramat@...nel.org,
        ndesaulniers@...gle.com, ojeda@...nel.org, peterz@...radead.org,
        rafael.j.wysocki@...el.com, revest@...omium.org,
        robert.moore@...el.com, will@...nel.org
Subject: Re: [PATCH v2 4/8] ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS

On Fri, 13 Jan 2023 18:03:51 +0000
Mark Rutland <mark.rutland@....com> wrote:

> Architectures without dynamic ftrace trampolines incur an overhead when
> multiple ftrace_ops are enabled with distinct filters. in these cases,
> each call site calls a common trampoline which uses
> ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
> so incurs an overhead relative to the size of this list (including RCU
> protection overhead).
> 
> Architectures with dynamic ftrace trampolines avoid this overhead for
> call sites which have a single associated ftrace_ops. In these cases,
> the dynamic trampoline is customized to branch directly to the relevant
> ftrace function, avoiding the list overhead.
> 
> On some architectures it's impractical and/or undesirable to implement
> dynamic ftrace trampolines. For example, arm64 has limited branch ranges
> and cannot always directly branch from a call site to an arbitrary
> address (e.g. from a kernel text address to an arbitrary module
> address). Calls from modules to core kernel text can be indirected via
> PLTs (allocated at module load time) to address this, but the same is
> not possible from calls from core kernel text.
> 
> Using an indirect branch from a call site to an arbitrary trampoline is
> possible, but requires several more instructions in the function
> prologue (or immediately before it), and/or comes with far more complex
> requirements for patching.
> 
> Instead, this patch adds a new option, where an architecture can
> associate each call site with a pointer to an ftrace_ops, placed at a
> fixed offset from the call site. A shared trampoline can recover this
> pointer and call ftrace_ops::func() without needing to go via
> ftrace_ops_list_func(), avoiding the associated overhead.
> 
> This avoids issues with branch range limitations, and avoids the need to
> allocate and manipulate dynamic trampolines, making it far simpler to
> implement and maintain, while having similar performance
> characteristics.
> 
> Note that this allows for dynamic ftrace_ops to be invoked directly from
> an architecture's ftrace_caller trampoline, whereas existing code forces
> the use of ftrace_ops_get_list_func(), which is in part necessary to
> permit the ftrace_ops to be freed once unregistereed *and* to avoid
> branch/address-generation range limitation on some architectures (e.g.
> where ops->func is a module address, and may be outside of the direct
> branch range for callsites within the main kernel image).
> 
> The CALL_OPS approach avoids this problems and is safe as:
> 
> * The existing synchronization in ftrace_shutdown() using
>   ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
>   synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
>   to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
>   or while invoking ftrace_ops::func), when that ftrace_ops is
>   unregistered.
> 
>   Arguably this could also be relied upon for the existing scheme,
>   permitting dynamic ftrace_ops to be invoked directly when ops->func is
>   in range, but this will require additional logic to handle branch
>   range limitations, and is not handled by this patch.
> 
> * Each callsite's ftrace_ops pointer literal can hold any valid kernel
>   address, and is updated atomically. As an architecture's ftrace_caller
>   trampoline will atomically load the ops pointer then derefrence
>   ops->func, there is no risk of invoking ops->func with a mismatches
>   ops pointer, and updates to the ops pointer do not require special
>   care.
> 
> A subsequent patch will implement architectures support for arm64. There
> should be no functional change as a result of this patch alone.
> 
> Signed-off-by: Mark Rutland <mark.rutland@....com>
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Florent Revest <revest@...omium.org>
> Cc: Masami Hiramatsu <mhiramat@...nel.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Will Deacon <will@...nel.org>
> ---
> 

Looks good. Looking through it, I don't see any issues. Although I didn't
test it ;-)

I probably should, but in the mean time (as my tests are currently
broken)...

Reviewed-by: Steven Rostedt (Google) <rostedt@...dmis.org>

-- Steve