linux-kernel - Re: [PATCH] tracing/timerlat: Check tlat_var for NULL in timerlat_fd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240822103721.1e2eb074@gandalf.local.home>
Date: Thu, 22 Aug 2024 10:37:21 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Tomas Glozar <tglozar@...hat.com>
Cc: linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org,
 jkacur@...hat.com, "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>
Subject: Re: [PATCH] tracing/timerlat: Check tlat_var for NULL in
 timerlat_fd_release

On Thu, 22 Aug 2024 10:32:02 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:

> > Yeah, it seems there might be multiple bugs in the user workload
> > handling, the other NULL pointer dereference and refcount warning
> > above might be related (but I have yet to reproduce it on an upstream
> > kernel). I'm also going to look at the code and will post any findings
> > here.  
> 
> Yes that is the second bug and it is related to the that this addresses.

There's nothing protecting the clearing of the kthreads and calling
put_task_struct(). Here's the fix to the second bug:

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 66a871553d4a..53de719f35cb 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2106,7 +2106,9 @@ static int osnoise_cpu_init(unsigned int cpu)
  */
 static int osnoise_cpu_die(unsigned int cpu)
 {
+	mutex_lock(&interface_lock);
 	stop_kthread(cpu);
+	mutex_unlock(&interface_lock);
 	return 0;
 }

@@ -2239,8 +2241,11 @@ static ssize_t osnoise_options_write(struct file *filp, const char __user *ubuf,
 	 */
 	mutex_lock(&trace_types_lock);
 	running = osnoise_has_registered_instances();
-	if (running)
+	if (running) {
+		mutex_lock(&interface_lock);
 		stop_per_cpu_kthreads();
+		mutex_unlock(&interface_lock);
+	}

 	mutex_lock(&interface_lock);
 	/*
@@ -2355,8 +2360,11 @@ osnoise_cpus_write(struct file *filp, const char __user *ubuf, size_t count,
 	 */
 	mutex_lock(&trace_types_lock);
 	running = osnoise_has_registered_instances();
-	if (running)
+	if (running) {
+		mutex_lock(&interface_lock);
 		stop_per_cpu_kthreads();
+		mutex_unlock(&interface_lock);
+	}

 	mutex_lock(&interface_lock);
 	/*
@@ -2951,7 +2960,9 @@ static void osnoise_workload_stop(void)
 	 */
 	barrier();

+	mutex_lock(&interface_lock);
 	stop_per_cpu_kthreads();
+	mutex_unlock(&interface_lock);

 	osnoise_unhook_events();
 }

With both of these fixes, the bug goes away.

I'll add this fix (after enabling lockdep and making sure I didn't screw up
the locking). Can you resend this patch with just not calling cancel if
kthread is NULL. No need to exit out early. I still like to make sure the
clean up happens, and not assume it will already be done.

-- Steve