[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP4=nvQeBnAtbQFjAdKhQUP1_Xi2uEH0i+kZC6TZwu7GZJYu3w@mail.gmail.com>
Date: Thu, 15 Jan 2026 14:32:27 +0100
From: Tomas Glozar <tglozar@...hat.com>
To: Crystal Wood <crwood@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, John Kacur <jkacur@...hat.com>,
Luis Goncalves <lgoncalv@...hat.com>, LKML <linux-kernel@...r.kernel.org>,
Linux Trace Kernel <linux-trace-kernel@...r.kernel.org>
Subject: Re: [PATCH] tracing/osnoise: Fix OSN_WORKLOAD-related crash
st 14. 1. 2026 v 20:49 odesÃlatel Crystal Wood <crwood@...hat.com> napsal:
> >
> > The kernel first displays several WARN traces with the following pattern:
> >
> > WARNING: CPU: 1 PID: 1822 at kernel/trace/trace_osnoise.c:1959 stop_kthread+0xb7/0xc0
>
> The line number doesn't match up for me; is this the first or second
> WARN_ON in that function?
The first one [1], sorry for the confusion, I used a log from an old
CentOS Stream 10 kernel for the commit description.
[1] https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/blob/7482e1508860dc8009827f87970b2f48fca251d4/kernel/trace/trace_osnoise.c#L1959
> > and finally a null pointer reference BUG:
> >
> > BUG: kernel NULL pointer dereference, address: 0000000000000030
> > ...
> > CPU: 1 UID: 0 PID: 2155 Comm: timerlatu/1
> > ...
> > Call Trace:
> > ...
> > ? timerlat_fd_read+0xf2/0x370
> > ? timerlat_fd_read+0xee/0x370
> > vfs_read+0xe8/0x370
> > ksys_read+0x6d/0xf0
> > do_syscall_64+0x7d/0x160
> > ...
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> What's the actual fault location? And those ? lines in the call trace
> are "considered to be additional clues" rather than actual unwound
> frames; what was in the ... above them?
Ah I dropped one line more than I wanted, the fault is in timerlat_fd_read:
[ 513.050331] BUG: kernel NULL pointer dereference, address: 0000000000000030
[ 513.050345] #PF: supervisor read access in kernel mode
[ 513.050350] #PF: error_code(0x0000) - not-present page
[ 513.050354] PGD 0 P4D 0
[ 513.050358] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 513.050364] CPU: 1 UID: 0 PID: 2155 Comm: timerlatu/1 Kdump: loaded
Tainted: G W ------- --- 6.12.0-35.el10.x86_64 #1
[ 513.050375] Tainted: [W]=WARN
[ 513.050378] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.16.3-3.fc41 04/01/2014
[ 513.050385] RIP: 0010:timerlat_fd_read+0xf2/0x370
[ 513.050392] Code: f7 53 e3 a5 9b c4 20 48 89 e8 48 c1 e8 03 48 f7
e2 48 c1 ea 04 48 39 ca 0f 83 8c 01 00 00 48 89 df e8 a2 f9 ff ff 48
8b 43 38 <48> 8b 40 30 ff d0 0f 1f 00 48 2b 43 50 48 89 c5 78 49 48 8b
43 60
[ 513.050406] RSP: 0018:ffffadd181227df8 EFLAGS: 00010246
[ 513.050411] RAX: 0000000000000000 RBX: ffff93739fd2e680 RCX: 0000000000000000
[ 513.050417] RDX: ffff937365b21560 RSI: ffff93734106b180 RDI: ffff9373422b4280
[ 513.050422] RBP: 000000000000b38b R08: 0000000000000000 R09: 0000000000000001
[ 513.050427] R10: 0000000080170010 R11: ffff93734106b180 R12: ffff937342c186c0
[ 513.050433] R13: 0000000000000001 R14: 00007f6e618dca90 R15: ffffadd181227ed0
[ 513.050547] FS: 00007f6e618dd6c0(0000) GS:ffff93739fd00000(0000)
knlGS:0000000000000000
[ 513.050659] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 513.050766] CR2: 0000000000000030 CR3: 000000000354e006 CR4: 0000000000772ef0
[ 513.050888] PKRU: 55555554
[ 513.051006] Call Trace:
[ 513.051134] <TASK>
[ 513.051249] ? show_trace_log_lvl+0x1b0/0x2f0
[ 513.051363] ? show_trace_log_lvl+0x1b0/0x2f0
[ 513.051476] ? vfs_read+0xe8/0x370
[ 513.051591] ? __die_body.cold+0x8/0x12
[ 513.051706] ? page_fault_oops+0x146/0x160
[ 513.051823] ? exc_page_fault+0x73/0x160
[ 513.051941] ? asm_exc_page_fault+0x26/0x30
[ 513.052060] ? timerlat_fd_read+0xf2/0x370
[ 513.052187] ? timerlat_fd_read+0xee/0x370
[ 513.052300] vfs_read+0xe8/0x370
[ 513.052410] ksys_read+0x6d/0xf0
[ 513.052519] do_syscall_64+0x7d/0x160
[ 513.052630] ? clear_bhb_loop+0x25/0x80
[ 513.052742] ? clear_bhb_loop+0x25/0x80
[ 513.052853] ? clear_bhb_loop+0x25/0x80
[ 513.052963] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 513.053081] RIP: 0033:0x7f6e619d7bbc
[ 513.053217] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08
e8 b9 8f f9 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31
c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 0f 90 f9
ff 48
[ 513.053470] RSP: 002b:00007f6e618dc910 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 513.053599] RAX: ffffffffffffffda RBX: 00007f6e618dca90 RCX: 00007f6e619d7bbc
[ 513.053731] RDX: 0000000000000400 RSI: 00007f6e618dca90 RDI: 0000000000000003
[ 513.053863] RBP: 00007f6e618dced0 R08: 0000000000000000 R09: 00007f6e5c000b90
[ 513.053998] R10: 00007f6e5c0008e0 R11: 0000000000000246 R12: 0000000000000001
[ 513.054145] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000001
[ 513.054288] </TASK>
I didn't look deeply into the mechanism of the crash, I just thought
that this is not a valid state to be setting OSNOISE_WORKLOAD in the
first place, so I can just disallow it and I don't have to dig deeper.
> > static int osnoise_options_open(struct inode *inode, struct file *file)
> > {
> > return seq_open(file, &osnoise_options_seq_ops);
> > @@ -2229,6 +2254,10 @@ static ssize_t osnoise_options_write(struct file *filp, const char __user *ubuf,
> > if (option < 0)
> > return -EINVAL;
> >
> > + retval = osnoise_validate_option(option, enable);
> > + if (retval != 0)
> > + return retval;
> > +
> > /*
> > * trace_types_lock is taken to avoid concurrency on start/stop.
> > */
>
> Shouldn't this be done under interface_lock to avoid concurrent
> timerlat_fd_open()? FWIW, your test script doesn't appear to cover the
> case of option setting racing with timerlat starting (due to the 2
> second delay).
My patch only takes into account the sequential case, which is already
rare to trigger - I don't know about any cases where this happened
during normal use. I would normally put the code behind the
interface_lock, if there weren't the stop_per_cpu_kthreads() thing in
between.
> Of course, this is complicated by stop_per_cpu_kthreads() happening
> before interface_lock is acquired. Do we know why that happens outside
> the lock? That might even be the actual cause of this bug.
Before commit b484a02c9c ("tracing/timerlat: Drop interface_lock in
stop_kthread()"), stop_kthread() took interface_lock, so
stop_per_cpu_kthreads() couldn't be called while holding
interface_lock. As that is no longer the case, the position of taking
interface_lock in osnoise_options_write() could be re-evaluated
(comment in 5bfbcd1ee57b says interface_lock cannot be taken at the
same time as cpu_read_lock).
I don't think it's directly related to this bug. All operations in the
reproducer are expected to be executed in sequence, so the lock should
have no effect on it.
> Though even in the non-race case, we might still want to return -EBUSY
> rather than just killing the thread (which might still have races since
> we don't wait for the user thread to die).
Yes, if we want to change the semantics to kill the user threads and
restart kernel threads on setting OSNOISE_WORKLOAD, we have to take
into account that the user threads might take an indefinite amount of
time to exit. The idea of my patch is to let the user dispose of the
user threads rather than the tracer, that will also avoid possible
deadlocks.
Tomas
Powered by blists - more mailing lists