[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20250317125226.2720728-1-ranxiaokai627@163.com>
Date: Mon, 17 Mar 2025 12:52:26 +0000
From: Ran Xiaokai <ranxiaokai627@....com>
To: vishalc@...ux.ibm.com
Cc: linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org,
mathieu.desnoyers@...icios.com,
mhiramat@...nel.org,
ran.xiaokai@....com.cn,
ranxiaokai627@....com,
rostedt@...dmis.org
Subject: Re: [PATCH] tracing/osnoise: Fix possible recursive locking for cpus_read_lock()
>On Tue, Feb 25, 2025 at 12:31:32PM +0000, Ran Xiaokai wrote:
>> From: Ran Xiaokai <ran.xiaokai@....com.cn>
>>
>> Lockdep reports this deadlock log:
>> ============================================
>> WARNING: possible recursive locking detected
>> --------------------------------------------
>> sh/31444 is trying to acquire lock:
>> ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
>> stop_per_cpu_kthreads+0x7/0x60
>>
>> but task is already holding lock:
>> ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
>> start_per_cpu_kthreads+0x28/0x140
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock(cpu_hotplug_lock);
>> lock(cpu_hotplug_lock);
>>
>> Call Trace:
>> <TASK>
>> __lock_acquire+0x1612/0x29b0
>> lock_acquire+0xd0/0x2e0
>> cpus_read_lock+0x49/0x120
>> stop_per_cpu_kthreads+0x7/0x60
>> start_kthread+0x105/0x120
>> start_per_cpu_kthreads+0xdd/0x140
>> osnoise_workload_start+0x261/0x2f0
>> osnoise_tracer_start+0x18/0x4
>>
>> In start_kthread(), when kthread_run_on_cpu() fails,
>> cpus_read_unlock() should be called before stop_per_cpu_kthreads(),
>> but both start_per_cpu_kthreads() and start_kthread() call the error
>> handling routine stop_per_cpu_kthreads(),
>> which is redundant. Only one call is necessary.
>> To fix this, move stop_per_cpu_kthreads() outside of start_kthread(),
>> use the return value of start_kthread() to determine kthread creation
>> error.
>> The same issue exists in osnoise_hotplug_workfn() too.
>>
>> Reviewed-by: Yang Guang <yang.guang5@....com.cn>
>> Reviewed-by: Wang Yong <wang.yong12@....com.cn>
>> Signed-off-by: Ran Xiaokai <ran.xiaokai@....com.cn>
>> ---
>> kernel/trace/trace_osnoise.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>> index 92e16f03fa4e..38fb0c655f5b 100644
>> --- a/kernel/trace/trace_osnoise.c
>> +++ b/kernel/trace/trace_osnoise.c
>> @@ -2029,7 +2029,6 @@ static int start_kthread(unsigned int cpu)
>>
>> if (IS_ERR(kthread)) {
>> pr_err(BANNER "could not start sampling thread\n");
>> - stop_per_cpu_kthreads();
>> return -ENOMEM;
>> }
>>
>> @@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
>> work_struct *dummy)
>> return;
>>
>> guard(mutex)(&interface_lock);
>> - guard(cpus_read_lock)();
>> + cpus_read_lock();
>>
>> if (!cpu_online(cpu))
>> return;
>> @@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
>> work_struct *dummy)
>> if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
>> return;
>>
>> - start_kthread(cpu);
>> + if (start_kthread(cpu)) {
>> + cpus_read_unlock();
>> + stop_per_cpu_kthreads();
>
>Is it right to call stop_per_cpu_kthreads() which stops osnoise kthread
>for every other CPUs in the system if a failure occurs during hotplug of a
>CPU?
I also suspect that this is not a rational behavior.
>On another note, since stop_per_cpu_kthreads() invokes stop_kthread()
>for every online CPU. It's better to remove stop_per_cpu_kthreads() from
>start_kthread() and handle the error in `osnoise_hotplug_workfn`
Hi, Vishal
I did this in my first versin, something like this:
@@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
work_struct *dummy)
return;
guard(mutex)(&interface_lock);
- guard(cpus_read_lock)();
+ cpus_read_lock();
if (!cpu_online(cpu)) {
+ cpus_read_unlock();
return;
}
if (!cpumask_test_cpu(cpu, &osnoise_cpumask)) {
+ cpus_read_unlock();
return;
}
- start_kthread(cpu);
+ if (start_kthread(cpu)) {
+ cpus_read_unlock();
+ stop_per_cpu_kthreads();
+ return;
+ }
+ cpus_read_unlock();
}
We have to drop the guard() and call unlock() manually,
this somewhat makes the code redundant.
> Vishal
>> + return;
>> + }
>> + cpus_read_unlock();
>> }
>>
>> static DECLARE_WORK(osnoise_hotplug_work, osnoise_hotplug_workfn);
>> --
>> 2.15.2
>>
Powered by blists - more mailing lists