From: "Steven Rostedt (Red Hat)" There's been a nasty bug that would show up and not give much info. The bug displayed the following warning: WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230() Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] __ftrace_hash_rec_update+0x1e3/0x230 [] ftrace_hash_move+0x28/0x1d0 [] ? kfree+0x2c/0x110 [] ftrace_regex_release+0x8e/0x150 [] __fput+0xae/0x220 [] ____fput+0xe/0x10 [] task_work_run+0x72/0x90 [] do_notify_resume+0x6c/0xc0 [] ? trace_hardirqs_on_thunk+0x3a/0x3c [] int_signal+0x12/0x17 ---[ end trace 793179526ee09b2c ]--- It was finally narrowed down to unloading a module that was being traced. It was actually more than that. When functions are being traced, there's a table of all functions that have a ref count of the number of active tracers attached to that function. When a function trace callback is registered to a function, the function's record ref count is incremented. When it is unregistered, the function's record ref count is decremented. If an inconsistency is detected (ref count goes below zero) the above warning is shown and the function tracing is permanently disabled until reboot. When a module is unloaded, it frees the function records that represent the module functions. These records exist on their own pages, that is function records for one module will not exist on the same page as function records for other modules or even the core kernel. Now, when a callback is registered to a function, it holds a "hash" of function addresses that it traces. When it unregisters, the hash is examined and any record that exists in the hash will decrement the functions record's ref count. Hashes that point to a module function that was freed are simply ignored. But what happens if the trace is still going on and you reload the same module. If the module is allocated in the same location, the hashes of the registered functions will still be mapped to the module functions (if it was tracing the module function). The problem is that the module would allocate new function record tables with their ref counts as zero. When a callback is unregistered, if the hash matches the function record of a reloaded module, it will dec the ref count, but as it was zero it would trigger the above warning and disable function tracing. With the help of Steve and Joern, we found a reproducer: Using uinput module and uinput_release function. modprobe uinput echo uinput_release > /sys/kernel/debug/tracing/set_ftrace_filter rmmod uinput modprobe uinput // check /proc/modules to see if loaded in same addr, otherwise try again echo > /sys/kernel/debug/tracing/set_ftrace_filter [BOOM] The solution here is on module unload, look at all the registered funtion callbacks and remove any hash entry that points to any function in the module that is about to be removed. Link: http://lkml.kernel.org/r/CADHUQJ58icP+d3QTXheiCnZwu6hrAqG2heMhhcuQvghXavHRFw@mail.gmail.com Reported-by: Jörn Engel Reported-by: Dave Jones Reported-by: Steve Hodgson Tested-by: Steve Hodgson Signed-off-by: Steven Rostedt --- kernel/trace/ftrace.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 92d3334..6de8cbd 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -4061,6 +4061,63 @@ static int ftrace_process_locs(struct module *mod, #ifdef CONFIG_MODULES +/* + * If the filter is cleared, then all functions may end up being + * traced. We need to pass that information down to update the + * records. + */ +static bool remove_rec_entry(struct ftrace_hash *hash, struct dyn_ftrace *rec) +{ + struct ftrace_func_entry *entry; + + entry = ftrace_lookup_ip(hash, rec->ip); + if (!entry) + return false; + + free_hash_entry(hash, entry); + + /* If we cleared the hash, then we now trace all functions */ + return ftrace_hash_empty(hash); +} + +static void clear_recs(struct ftrace_ops *ops, struct ftrace_page *pg) +{ + struct dyn_ftrace *rec; + bool update = false; + int i; + + for (i = 0; i < pg->index; i++) { + rec = &pg->records[i]; + + /* If the filter hash gets cleared, we trace all functions */ + if (remove_rec_entry(ops->filter_hash, rec)) + update = true; + remove_rec_entry(ops->notrace_hash, rec); + } + + if (update) { + ftrace_hash_rec_enable(ops, 1); + if (ftrace_enabled) + ftrace_run_update_code(FTRACE_UPDATE_CALLS); + } +} + +static bool ops_has_filter(struct ftrace_ops *ops) +{ + return !ftrace_hash_empty(ops->filter_hash) || + !ftrace_hash_empty(ops->notrace_hash); +} + +static void clear_hashes(struct ftrace_page *pg) +{ + struct ftrace_ops *ops; + + for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) { + if ((ops->flags & FTRACE_OPS_FL_ENABLED) && ops_has_filter(ops)) + clear_recs(ops, pg); + } +} + #define next_to_ftrace_page(p) container_of(p, struct ftrace_page, next) void ftrace_release_mod(struct module *mod) @@ -4094,6 +4151,9 @@ void ftrace_release_mod(struct module *mod) if (pg == ftrace_pages) ftrace_pages = next_to_ftrace_page(last_pg); + /* Make sure no hashes reference this module record */ + clear_hashes(pg); + *last_pg = pg->next; order = get_count_order(pg->size / ENTRIES_PER_PAGE); free_pages((unsigned long)pg->records, order); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/