[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240822103721.1e2eb074@gandalf.local.home>
Date: Thu, 22 Aug 2024 10:37:21 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Tomas Glozar <tglozar@...hat.com>
Cc: linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org,
jkacur@...hat.com, "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>
Subject: Re: [PATCH] tracing/timerlat: Check tlat_var for NULL in
timerlat_fd_release
On Thu, 22 Aug 2024 10:32:02 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:
> > Yeah, it seems there might be multiple bugs in the user workload
> > handling, the other NULL pointer dereference and refcount warning
> > above might be related (but I have yet to reproduce it on an upstream
> > kernel). I'm also going to look at the code and will post any findings
> > here.
>
> Yes that is the second bug and it is related to the that this addresses.
There's nothing protecting the clearing of the kthreads and calling
put_task_struct(). Here's the fix to the second bug:
diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 66a871553d4a..53de719f35cb 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2106,7 +2106,9 @@ static int osnoise_cpu_init(unsigned int cpu)
*/
static int osnoise_cpu_die(unsigned int cpu)
{
+ mutex_lock(&interface_lock);
stop_kthread(cpu);
+ mutex_unlock(&interface_lock);
return 0;
}
@@ -2239,8 +2241,11 @@ static ssize_t osnoise_options_write(struct file *filp, const char __user *ubuf,
*/
mutex_lock(&trace_types_lock);
running = osnoise_has_registered_instances();
- if (running)
+ if (running) {
+ mutex_lock(&interface_lock);
stop_per_cpu_kthreads();
+ mutex_unlock(&interface_lock);
+ }
mutex_lock(&interface_lock);
/*
@@ -2355,8 +2360,11 @@ osnoise_cpus_write(struct file *filp, const char __user *ubuf, size_t count,
*/
mutex_lock(&trace_types_lock);
running = osnoise_has_registered_instances();
- if (running)
+ if (running) {
+ mutex_lock(&interface_lock);
stop_per_cpu_kthreads();
+ mutex_unlock(&interface_lock);
+ }
mutex_lock(&interface_lock);
/*
@@ -2951,7 +2960,9 @@ static void osnoise_workload_stop(void)
*/
barrier();
+ mutex_lock(&interface_lock);
stop_per_cpu_kthreads();
+ mutex_unlock(&interface_lock);
osnoise_unhook_events();
}
With both of these fixes, the bug goes away.
I'll add this fix (after enabling lockdep and making sure I didn't screw up
the locking). Can you resend this patch with just not calling cancel if
kthread is NULL. No need to exit out early. I still like to make sure the
clean up happens, and not assume it will already be done.
-- Steve
Powered by blists - more mailing lists