linux-kernel - Re: [PATCH] tracing/osnoise: Fix OSN

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bbb834f13cb1480b2517497481a1b3e0137d089a.camel@redhat.com>
Date: Wed, 14 Jan 2026 13:49:17 -0600
From: Crystal Wood <crwood@...hat.com>
To: Tomas Glozar <tglozar@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
  Masami Hiramatsu <mhiramat@...nel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, John Kacur
	 <jkacur@...hat.com>, Luis Goncalves <lgoncalv@...hat.com>, LKML
	 <linux-kernel@...r.kernel.org>, Linux Trace Kernel
	 <linux-trace-kernel@...r.kernel.org>
Subject: Re: [PATCH] tracing/osnoise: Fix OSN_WORKLOAD-related crash

On Wed, 2026-01-14 at 13:35 +0100, Tomas Glozar wrote:
> A kernel panic was observed in the timerlat tracer with the following
> reproducer:
> 
>    #!/bin/bash
>    while true; do
>       rtla timerlat hist -u -d 5s & PID=$!
>       sleep 2
>       echo OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options
>       rtla timerlat hist -k -d 1s
>    done
> 
> The kernel first displays several WARN traces with the following pattern:
> 
>    WARNING: CPU: 1 PID: 1822 at kernel/trace/trace_osnoise.c:1959 stop_kthread+0xb7/0xc0

The line number doesn't match up for me; is this the first or second
WARN_ON in that function?

> and finally a null pointer reference BUG:
> 
>    BUG: kernel NULL pointer dereference, address: 0000000000000030
>    ...
>    CPU: 1 UID: 0 PID: 2155 Comm: timerlatu/1
>    ...
>    Call Trace:
>      ...
>      ? timerlat_fd_read+0xf2/0x370
>      ? timerlat_fd_read+0xee/0x370
>      vfs_read+0xe8/0x370
>      ksys_read+0x6d/0xf0
>      do_syscall_64+0x7d/0x160
>      ...
>      entry_SYSCALL_64_after_hwframe+0x76/0x7e

What's the actual fault location?  And those ? lines in the call trace
are "considered to be additional clues" rather than actual unwound
frames; what was in the ... above them?

>  static int osnoise_options_open(struct inode *inode, struct file *file)
>  {
>  	return seq_open(file, &osnoise_options_seq_ops);
> @@ -2229,6 +2254,10 @@ static ssize_t osnoise_options_write(struct file *filp, const char __user *ubuf,
>  	if (option < 0)
>  		return -EINVAL;
>  
> +	retval = osnoise_validate_option(option, enable);
> +	if (retval != 0)
> +		return retval;
> +
>  	/*
>  	 * trace_types_lock is taken to avoid concurrency on start/stop.
>  	 */

Shouldn't this be done under interface_lock to avoid concurrent
timerlat_fd_open()?  FWIW, your test script doesn't appear to cover the
case of option setting racing with timerlat starting (due to the 2
second delay).

Of course, this is complicated by stop_per_cpu_kthreads() happening
before interface_lock is acquired.  Do we know why that happens outside
the lock?  That might even be the actual cause of this bug.

Though even in the non-race case, we might still want to return -EBUSY
rather than just killing the thread (which might still have races since
we don't wait for the user thread to die).

-Crystal