[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADHUQJ5L54YihGsB-WgOggN257FRetoFy4zsqK2GGtaXzMrxKQ@mail.gmail.com>
Date: Thu, 1 Aug 2013 21:51:39 -0700
From: Steve Hodgson <steve@...estorage.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Jörn Engel <joern@...fs.org>,
Dave Jones <davej@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: __ftrace_hash_rec_update FTRACE_WARN_ON.
On Tue, Jul 30, 2013 at 5:51 PM, Steven Rostedt <rostedt@...dmis.org> wrote:
> On Sat, 2013-07-27 at 12:16 -0700, Steve Hodgson wrote:
>
>> This patch fixes ftrace across module removal/reinsertion on our 3.6.11 kernel.
>
> I found a few corner cases that this patch doesn't work with, and the
> solution is just getting uglier and uglier.
>
> I found a new solution that seems to handle all the corner cases
> (including this one). Can you test this as well. I want to add your
> tested-by tag for this too.
>
> Thanks!
>
> -- Steve
This patch also fixes the issue for me: ftrace works fine across module removal.
Sorry for being a little slow in testing this.
- Steve
>
> From 4c647dde26cf8c47ccaaed6f2e2fffee5dab5871 Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (Red Hat)" <rostedt@...dmis.org>
> Date: Tue, 30 Jul 2013 00:04:32 -0400
> Subject: [PATCH] ftrace: Check module functions being traced on reload
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> There's been a nasty bug that would show up and not give much info.
> The bug displayed the following warning:
>
> WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
> Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk
> Call Trace:
> [<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
> [<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
> [<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
> [<ffffffff811401cc>] ? kfree+0x2c/0x110
> [<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
> [<ffffffff81149f1e>] __fput+0xae/0x220
> [<ffffffff8114a09e>] ____fput+0xe/0x10
> [<ffffffff8105fa22>] task_work_run+0x72/0x90
> [<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
> [<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
> [<ffffffff815c0f88>] int_signal+0x12/0x17
> ---[ end trace 793179526ee09b2c ]---
>
> It was finally narrowed down to unloading a module that was being traced.
>
> It was actually more than that. When functions are being traced, there's
> a table of all functions that have a ref count of the number of active
> tracers attached to that function. When a function trace callback is
> registered to a function, the function's record ref count is incremented.
> When it is unregistered, the function's record ref count is decremented.
> If an inconsistency is detected (ref count goes below zero) the above
> warning is shown and the function tracing is permanently disabled until
> reboot.
>
> The ftrace callback ops holds a hash of functions that it filters on
> (and/or filters off). If the hash is empty, the default means to filter
> all functions (for the filter_hash) or to disable no functions (for the
> notrace_hash).
>
> When a module is unloaded, it frees the function records that represent
> the module functions. These records exist on their own pages, that is
> function records for one module will not exist on the same page as
> function records for other modules or even the core kernel.
>
> Now when a module unloads, the records that represents its functions are
> freed. When the module is loaded again, the records are recreated with
> a default ref count of zero (unless there's a callback that traces all
> functions, then they will also be traced, and the ref count will be
> incremented).
>
> The problem is that if an ftrace callback hash includes functions of the
> module being unloaded, those hash entries will not be removed. If the
> module is reloaded in the same location, the hash entries still point
> to the functions of the module but the module's ref counts do not reflect
> that.
>
> With the help of Steve and Joern, we found a reproducer:
>
> Using uinput module and uinput_release function.
>
> cd /sys/kernel/debug/tracing
> modprobe uinput
> echo uinput_release > set_ftrace_filter
> echo function > current_tracer
> rmmod uinput
> modprobe uinput
> # check /proc/modules to see if loaded in same addr, otherwise try again
> echo nop > current_tracer
>
> [BOOM]
>
> The above loads the uinput module, which creates a table of functions that
> can be traced within the module.
>
> We add uinput_release to the filter_hash to trace just that function.
>
> Enable function tracincg, which increments the ref count of the record
> associated to uinput_release.
>
> Remove uinput, which frees the records including the one that represents
> uinput_release.
>
> Load the uinput module again (and make sure it's at the same address).
> This recreates the function records all with a ref count of zero,
> including uinput_release.
>
> Disable function tracing, which will decrement the ref count for uinput_release
> which is now zero because of the module removal and reload, and we have
> a mismatch (below zero ref count).
>
> The solution is to check all currently tracing ftrace callbacks to see if any
> are tracing any of the module's functions when a module is loaded (it already does
> that with callbacks that trace all functions). If a callback happens to have
> a module function being traced, it increments that records ref count and starts
> tracing that function.
>
> There may be a strange side effect with this, where tracing module functions
> on unload and then reloading a new module may have that new module's functions
> being traced. This may be something that confuses the user, but it's not
> a big deal. Another approach is to disable all callback hashes on module unload,
> but this leaves some ftrace callbacks that may not be registered, but can
> still have hashes tracing the module's function where ftrace doesn't know about
> it. That situation can cause the same bug. This solution solves that case too.
> Another benefit of this solution, is it is possible to trace a module's
> function on unload and load.
>
> Link: http://lkml.kernel.org/r/CADHUQJ58icP+d3QTXheiCnZwu6hrAqG2heMhhcuQvghXavHRFw@mail.gmail.com
>
> Reported-by: Jörn Engel <joern@...fs.org>
> Reported-by: Dave Jones <davej@...hat.com>
> Reported-by: Steve Hodgson <steve@...estorage.com>
> Soon-to-be-tested-by: Steve Hodgson <steve@...estorage.com>
> Signed-off-by: Steven Rostedt <rostedt@...dmis.org>
> ---
> kernel/trace/ftrace.c | 71 ++++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 62 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 92d3334..a6d098c 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2169,12 +2169,57 @@ static cycle_t ftrace_update_time;
> static unsigned long ftrace_update_cnt;
> unsigned long ftrace_update_tot_cnt;
>
> -static int ops_traces_mod(struct ftrace_ops *ops)
> +static inline int ops_traces_mod(struct ftrace_ops *ops)
> {
> - struct ftrace_hash *hash;
> + /*
> + * Filter_hash being empty will default to trace module.
> + * But notrace hash requires a test of individual module functions.
> + */
> + return ftrace_hash_empty(ops->filter_hash) &&
> + ftrace_hash_empty(ops->notrace_hash);
> +}
> +
> +/*
> + * Check if the current ops references the record.
> + *
> + * If the ops traces all functions, then it was already accounted for.
> + * If the ops does not trace the current record function, skip it.
> + * If the ops ignores the function via notrace filter, skip it.
> + */
> +static inline bool
> +ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec)
> +{
> + /* If ops isn't enabled, ignore it */
> + if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
> + return 0;
> +
> + /* If ops traces all mods, we already accounted for it */
> + if (ops_traces_mod(ops))
> + return 0;
> +
> + /* The function must be in the filter */
> + if (!ftrace_hash_empty(ops->filter_hash) &&
> + !ftrace_lookup_ip(ops->filter_hash, rec->ip))
> + return 0;
>
> - hash = ops->filter_hash;
> - return ftrace_hash_empty(hash);
> + /* If in notrace hash, we ignore it too */
> + if (ftrace_lookup_ip(ops->notrace_hash, rec->ip))
> + return 0;
> +
> + return 1;
> +}
> +
> +static int referenced_filters(struct dyn_ftrace *rec)
> +{
> + struct ftrace_ops *ops;
> + int cnt = 0;
> +
> + for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) {
> + if (ops_references_rec(ops, rec))
> + cnt++;
> + }
> +
> + return cnt;
> }
>
> static int ftrace_update_code(struct module *mod)
> @@ -2183,6 +2228,7 @@ static int ftrace_update_code(struct module *mod)
> struct dyn_ftrace *p;
> cycle_t start, stop;
> unsigned long ref = 0;
> + bool test = false;
> int i;
>
> /*
> @@ -2196,9 +2242,12 @@ static int ftrace_update_code(struct module *mod)
>
> for (ops = ftrace_ops_list;
> ops != &ftrace_list_end; ops = ops->next) {
> - if (ops->flags & FTRACE_OPS_FL_ENABLED &&
> - ops_traces_mod(ops))
> - ref++;
> + if (ops->flags & FTRACE_OPS_FL_ENABLED) {
> + if (ops_traces_mod(ops))
> + ref++;
> + else
> + test = true;
> + }
> }
> }
>
> @@ -2208,12 +2257,16 @@ static int ftrace_update_code(struct module *mod)
> for (pg = ftrace_new_pgs; pg; pg = pg->next) {
>
> for (i = 0; i < pg->index; i++) {
> + int cnt = ref;
> +
> /* If something went wrong, bail without enabling anything */
> if (unlikely(ftrace_disabled))
> return -1;
>
> p = &pg->records[i];
> - p->flags = ref;
> + if (test)
> + cnt += referenced_filters(p);
> + p->flags = cnt;
>
> /*
> * Do the initial record conversion from mcount jump
> @@ -2233,7 +2286,7 @@ static int ftrace_update_code(struct module *mod)
> * conversion puts the module to the correct state, thus
> * passing the ftrace_make_call check.
> */
> - if (ftrace_start_up && ref) {
> + if (ftrace_start_up && cnt) {
> int failed = __ftrace_replace_code(p, 1);
> if (failed)
> ftrace_bug(failed, p->ip);
> --
> 1.8.1.4
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists