[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230117155203.3a66744e@gandalf.local.home>
Date: Tue, 17 Jan 2023 15:52:03 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Mark Rutland <mark.rutland@....com>
Cc: linux-arm-kernel@...ts.infradead.org, catalin.marinas@....com,
lenb@...nel.org, linux-acpi@...r.kernel.org,
linux-kernel@...r.kernel.org, mhiramat@...nel.org,
ndesaulniers@...gle.com, ojeda@...nel.org, peterz@...radead.org,
rafael.j.wysocki@...el.com, revest@...omium.org,
robert.moore@...el.com, will@...nel.org
Subject: Re: [PATCH v2 4/8] ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
On Fri, 13 Jan 2023 18:03:51 +0000
Mark Rutland <mark.rutland@....com> wrote:
> Architectures without dynamic ftrace trampolines incur an overhead when
> multiple ftrace_ops are enabled with distinct filters. in these cases,
> each call site calls a common trampoline which uses
> ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
> so incurs an overhead relative to the size of this list (including RCU
> protection overhead).
>
> Architectures with dynamic ftrace trampolines avoid this overhead for
> call sites which have a single associated ftrace_ops. In these cases,
> the dynamic trampoline is customized to branch directly to the relevant
> ftrace function, avoiding the list overhead.
>
> On some architectures it's impractical and/or undesirable to implement
> dynamic ftrace trampolines. For example, arm64 has limited branch ranges
> and cannot always directly branch from a call site to an arbitrary
> address (e.g. from a kernel text address to an arbitrary module
> address). Calls from modules to core kernel text can be indirected via
> PLTs (allocated at module load time) to address this, but the same is
> not possible from calls from core kernel text.
>
> Using an indirect branch from a call site to an arbitrary trampoline is
> possible, but requires several more instructions in the function
> prologue (or immediately before it), and/or comes with far more complex
> requirements for patching.
>
> Instead, this patch adds a new option, where an architecture can
> associate each call site with a pointer to an ftrace_ops, placed at a
> fixed offset from the call site. A shared trampoline can recover this
> pointer and call ftrace_ops::func() without needing to go via
> ftrace_ops_list_func(), avoiding the associated overhead.
>
> This avoids issues with branch range limitations, and avoids the need to
> allocate and manipulate dynamic trampolines, making it far simpler to
> implement and maintain, while having similar performance
> characteristics.
>
> Note that this allows for dynamic ftrace_ops to be invoked directly from
> an architecture's ftrace_caller trampoline, whereas existing code forces
> the use of ftrace_ops_get_list_func(), which is in part necessary to
> permit the ftrace_ops to be freed once unregistereed *and* to avoid
> branch/address-generation range limitation on some architectures (e.g.
> where ops->func is a module address, and may be outside of the direct
> branch range for callsites within the main kernel image).
>
> The CALL_OPS approach avoids this problems and is safe as:
>
> * The existing synchronization in ftrace_shutdown() using
> ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
> synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
> to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
> or while invoking ftrace_ops::func), when that ftrace_ops is
> unregistered.
>
> Arguably this could also be relied upon for the existing scheme,
> permitting dynamic ftrace_ops to be invoked directly when ops->func is
> in range, but this will require additional logic to handle branch
> range limitations, and is not handled by this patch.
>
> * Each callsite's ftrace_ops pointer literal can hold any valid kernel
> address, and is updated atomically. As an architecture's ftrace_caller
> trampoline will atomically load the ops pointer then derefrence
> ops->func, there is no risk of invoking ops->func with a mismatches
> ops pointer, and updates to the ops pointer do not require special
> care.
>
> A subsequent patch will implement architectures support for arm64. There
> should be no functional change as a result of this patch alone.
>
> Signed-off-by: Mark Rutland <mark.rutland@....com>
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Florent Revest <revest@...omium.org>
> Cc: Masami Hiramatsu <mhiramat@...nel.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Will Deacon <will@...nel.org>
> ---
>
Looks good. Looking through it, I don't see any issues. Although I didn't
test it ;-)
I probably should, but in the mean time (as my tests are currently
broken)...
Reviewed-by: Steven Rostedt (Google) <rostedt@...dmis.org>
-- Steve
Powered by blists - more mailing lists