[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250911104138.2830f60b@gandalf.local.home>
Date: Thu, 11 Sep 2025 10:41:38 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Vladimir Riabchun <ferr.lambarginio@...il.com>
Cc: mhiramat@...nel.org, mark.rutland@....com, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH] ftrace: Fix softlockup in ftrace_module_enable
On Thu, 11 Sep 2025 15:33:15 +0200
Vladimir Riabchun <ferr.lambarginio@...il.com> wrote:
> A soft lockup was observed when loading amdgpu module,
I'd like to see more about that soft lockup.
> this is the same issue that was fixed in
> commit d0b24b4e91fc ("ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY
> kernels") and commit 42ea22e754ba ("ftrace: Add cond_resched() to
> ftrace_graph_set_hash()").
The above cond_resched() is in the loop of all records that actually look
at all records! And that can be pretty big. On my server, it shows on boot:
[ 1.934175] ftrace: allocating 45706 entries in 180 pages
[ 1.934177] ftrace: allocated 180 pages with 4 groups
That means the loop will go through 45,706 entries. That's quite a lot and
a cond_resched() makes perfect sense.
>
> Fix it the same way by adding cond_resched() in ftrace_module_enable.
>
> Signed-off-by: Vladimir Riabchun <ferr.lambarginio@...il.com>
> ---
> kernel/trace/ftrace.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index a69067367c29..23c4d37c7bcd 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -7526,6 +7526,9 @@ void ftrace_module_enable(struct module *mod)
>
> do_for_each_ftrace_rec(pg, rec) {
> int cnt;
> +
> + cond_resched();
> +
> /*
> * do_for_each_ftrace_rec() is a double loop.
> * module text shares the pg. If a record is
This loop is different. Let me show a bit more context:
do_for_each_ftrace_rec(pg, rec) {
int cnt;
/*
* do_for_each_ftrace_rec() is a double loop.
* module text shares the pg. If a record is
* not part of this module, then skip this pg,
* which the "break" will do.
*/
if (!within_module(rec->ip, mod))
break;
See this "if (!within_module(rec->ip, mod))" break?
Look at the dmesg output again, and you'll see "groups" mentioned.
[ 1.934177] ftrace: allocated 180 pages with 4 groups
That "4 groups" means there are 4 "page groups". That's the "pg" in the
do_for_each_ftrace_recr() function.
This means in my scenario, it loops 4 times. And then it will loop through
each module.
How big is the amdgpu driver? How many functions does it have?
# grep amdgpu /sys/kernel/tracing/available_filter_functions | wc -l
And I'm guessing that this is only an issue when ftrace is enabled:
if (ftrace_start_up && cnt) {
int failed = __ftrace_replace_code(rec, 1);
if (failed) {
ftrace_bug(failed, rec);
goto out_loop;
}
}
As that could slow things down.
If this is all the case, then the cond_resched() should be with the
ftrace_start_up code and not in the open like you have it.
if (ftrace_start_up && cnt) {
int failed = __ftrace_replace_code(rec, 1);
if (failed) {
ftrace_bug(failed, rec);
goto out_loop;
}
+ cond_resched();
}
-- Steve
Powered by blists - more mailing lists