linux-kernel - Re: [PATCH] tracing/osnoise: Fix possible recursive locking for cpus_read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20250317125226.2720728-1-ranxiaokai627@163.com>
Date: Mon, 17 Mar 2025 12:52:26 +0000
From: Ran Xiaokai <ranxiaokai627@....com>
To: vishalc@...ux.ibm.com
Cc: linux-kernel@...r.kernel.org,
	linux-trace-kernel@...r.kernel.org,
	mathieu.desnoyers@...icios.com,
	mhiramat@...nel.org,
	ran.xiaokai@....com.cn,
	ranxiaokai627@....com,
	rostedt@...dmis.org
Subject: Re: [PATCH] tracing/osnoise: Fix possible recursive locking for cpus_read_lock()

>On Tue, Feb 25, 2025 at 12:31:32PM +0000, Ran Xiaokai wrote:
>> From: Ran Xiaokai <ran.xiaokai@....com.cn>
>> 
>> Lockdep reports this deadlock log:
>> ============================================
>> WARNING: possible recursive locking detected
>> --------------------------------------------
>> sh/31444 is trying to acquire lock:
>> ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
>> stop_per_cpu_kthreads+0x7/0x60
>> 
>> but task is already holding lock:
>> ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
>> start_per_cpu_kthreads+0x28/0x140
>> 
>> other info that might help us debug this:
>>  Possible unsafe locking scenario:
>> 
>>        CPU0
>>        ----
>>   lock(cpu_hotplug_lock);
>>   lock(cpu_hotplug_lock);
>> 
>> Call Trace:
>>  <TASK>
>>  __lock_acquire+0x1612/0x29b0
>>  lock_acquire+0xd0/0x2e0
>>  cpus_read_lock+0x49/0x120
>>  stop_per_cpu_kthreads+0x7/0x60
>>  start_kthread+0x105/0x120
>>  start_per_cpu_kthreads+0xdd/0x140
>>  osnoise_workload_start+0x261/0x2f0
>>  osnoise_tracer_start+0x18/0x4
>> 
>> In start_kthread(), when kthread_run_on_cpu() fails,
>> cpus_read_unlock() should be called before stop_per_cpu_kthreads(),
>> but both start_per_cpu_kthreads() and start_kthread() call the error
>> handling routine stop_per_cpu_kthreads(),
>> which is redundant. Only one call is necessary.
>> To fix this, move stop_per_cpu_kthreads() outside of start_kthread(),
>> use the return value of start_kthread() to determine kthread creation
>> error.
>> The same issue exists in osnoise_hotplug_workfn() too.
>> 
>> Reviewed-by: Yang Guang <yang.guang5@....com.cn>
>> Reviewed-by: Wang Yong <wang.yong12@....com.cn>
>> Signed-off-by: Ran Xiaokai <ran.xiaokai@....com.cn>
>> ---
>>  kernel/trace/trace_osnoise.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>> 
>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>> index 92e16f03fa4e..38fb0c655f5b 100644
>> --- a/kernel/trace/trace_osnoise.c
>> +++ b/kernel/trace/trace_osnoise.c
>> @@ -2029,7 +2029,6 @@ static int start_kthread(unsigned int cpu)
>>  
>>  	if (IS_ERR(kthread)) {
>>  		pr_err(BANNER "could not start sampling thread\n");
>> -		stop_per_cpu_kthreads();
>>  		return -ENOMEM;
>>  	}
>>  
>> @@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
>> work_struct *dummy)
>>  		return;
>>  
>>  	guard(mutex)(&interface_lock);
>> -	guard(cpus_read_lock)();
>> +	cpus_read_lock();
>>  
>>  	if (!cpu_online(cpu))
>>  		return;
>> @@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
>> work_struct *dummy)
>>  	if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
>>  		return;
>>  
>> -	start_kthread(cpu);
>> +	if (start_kthread(cpu)) {
>> +		cpus_read_unlock();
>> +		stop_per_cpu_kthreads();
>
>Is it right to call stop_per_cpu_kthreads() which stops osnoise kthread
>for every other CPUs in the system if a failure occurs during hotplug of a
>CPU?

I also suspect that this is not a rational behavior.

>On another note, since stop_per_cpu_kthreads() invokes stop_kthread()
>for every online CPU. It's better to remove stop_per_cpu_kthreads() from
>start_kthread() and handle the error in `osnoise_hotplug_workfn` 

Hi, Vishal
I did this in my first versin, something like this:

@@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
work_struct *dummy)
 		return;
 
 	guard(mutex)(&interface_lock);
-	guard(cpus_read_lock)();
+	cpus_read_lock();
 
 	if (!cpu_online(cpu)) {
+		cpus_read_unlock();
 		return;
	}

 	if (!cpumask_test_cpu(cpu, &osnoise_cpumask)) {
+		cpus_read_unlock();
 		return;
	}
-	start_kthread(cpu);
+	if (start_kthread(cpu)) {
+		cpus_read_unlock();
+		stop_per_cpu_kthreads();
+		return;
+	}
+	cpus_read_unlock();
 }

We have to drop the guard() and call unlock() manually,
this somewhat makes the code redundant.

>	Vishal
>> +		return;
>> +	}
>> +	cpus_read_unlock();
>>  }
>>  
>>  static DECLARE_WORK(osnoise_hotplug_work, osnoise_hotplug_workfn);
>> -- 
>> 2.15.2
>>