[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZtES1Iw6K1GY85bS@uudg.org>
Date: Thu, 29 Aug 2024 21:31:16 -0300
From: "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>
To: Tomas Glozar <tglozar@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>,
linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org,
jkacur@...hat.com
Subject: Re: [PATCH] tracing/timerlat: Check tlat_var for NULL in
timerlat_fd_release
On Thu, Aug 29, 2024 at 08:40:25PM -0300, Luis Claudio R. Goncalves wrote:
> On Tue, Aug 27, 2024 at 04:34:39PM +0200, Tomas Glozar wrote:
> > po 26. 8. 2024 v 19:27 odesÃlatel Steven Rostedt <rostedt@...dmis.org> napsal:
> > >
> > > Yeah, I think I finally found the real issue. I don't think we need the ref
> > > counting. The problem is the creating and killing of the threads via the
> > > start and stop callbacks. That's not their purpose. The purpose of stop
> > > and start callbacks is when tracing_on is set to off and back on again. I
> > > think this is what is racing with the close.
> > >
> >
> > I believe I discovered the exact mechanism of the bug. Since rtla is
> > killed with SIGTERM, it leaves the timerlat tracer active and its
> > timerlatu processes running. The rtla from the next iteration of the
> > while loop in the reproducer deactivates the timerlat tracer,
> > triggering the killing of the timerlatu processes. However, this
> > killing is asynchronous, and while the processes are gradually dying
> > one by one and closing their fds, the rtla from the next iteration is
> > already activating the timerlat tracer again. One of the timerlatu
> > processes does not die fast enough to release the fd before the
> > tlat_var is cleared in osnoise_workload_start, triggering the panic.
>
> With this in mind I tried a simpler approach. I used a kernel without any
> extra patches from Steven or ourselves, just used the code below. Not
> saying it is a definitive solution, but as a proof of concept I have two
> boxes running the loop used to reproduce the problem and so far not a
> single problem. The patch:
Premature commemoration. One of the boxes rebooted a minute ago. Sounds
like I just diminished a bit the window for triggering the problem.
Tomorrow I will run tests with all the suggested patches and work of a
better idea, more comprehensive, to address what Tomas described above.
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index 66a871553d4a1..331c69bc7b0d4 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -2239,8 +2239,11 @@ static ssize_t osnoise_options_write(struct file *filp, const char __user *ubuf,
> */
> mutex_lock(&trace_types_lock);
> running = osnoise_has_registered_instances();
> - if (running)
> + if (running) {
> stop_per_cpu_kthreads();
> + mutex_unlock(&trace_types_lock);
> + return -EBUSY;
> + }
>
> mutex_lock(&interface_lock);
> /*
> @@ -2579,7 +2582,8 @@ static int timerlat_fd_release(struct inode *inode, struct file *file)
> osn_var = per_cpu_ptr(&per_cpu_osnoise_var, cpu);
> tlat_var = per_cpu_ptr(&per_cpu_timerlat_var, cpu);
>
> - hrtimer_cancel(&tlat_var->timer);
> + if (tlat_var->kthread)
> + hrtimer_cancel(&tlat_var->timer);
> memset(tlat_var, 0, sizeof(*tlat_var));
>
> osn_var->sampling = 0;
> ---
>
> In short, do not start new instances if the old one still registered. And
> only cancel the timer if the timerlat per cpu var has not been zeroed-out.
>
> Again, not a final patch but a simpler approach to protect start/stop
> races.
>
> Luis
>
> >
> > You can see the bug in action in this detailed trace I got today, with
> > the use of in-kernel trace_printk together with bpftrace and
> > bpf_trace_printk and captured using ftrace_dump_on_oops (running the
> > same reproducer on a 8 CPU KVM system):
> > [ 103.227679] rtla-2325 2..... 98242607us :
> > osnoise_workload_start: tlat_var_reset: cpu=0,
> > tlat_var=0000000034996efa
> > [ 103.227844] rtla-2325 2..... 98242607us :
> > osnoise_workload_start: tlat_var_reset: cpu=1,
> > tlat_var=0000000062d1f520
> > [ 103.228009] rtla-2325 2..... 98242607us :
> > osnoise_workload_start: tlat_var_reset: cpu=2,
> > tlat_var=0000000002e262d8
> > [ 103.228173] rtla-2325 2..... 98242607us :
> > osnoise_workload_start: tlat_var_reset: cpu=3,
> > tlat_var=00000000b9d3a280
> > [ 103.228349] rtla-2325 2..... 98242607us :
> > osnoise_workload_start: tlat_var_reset: cpu=4,
> > tlat_var=000000000fba7a6e
> > [ 103.228510] rtla-2325 2..... 98242608us :
> > osnoise_workload_start: tlat_var_reset: cpu=5,
> > tlat_var=000000008ed0b87f
> > [ 103.228670] rtla-2325 2..... 98242608us :
> > osnoise_workload_start: tlat_var_reset: cpu=6,
> > tlat_var=00000000a02fe267
> > [ 103.228831] rtla-2325 2..... 98242608us :
> > osnoise_workload_start: tlat_var_reset: cpu=7,
> > tlat_var=000000008e1d5095
> > [ 103.228992] rtla-2327 3...11 98256551us :
> > bpf_trace_printk: rtla pid 2325 forks into 2328
> > [ 103.229156] rtla-2327 3...11 98256873us :
> > bpf_trace_printk: rtla pid 2325 forks into 2329
> > [ 103.229325] rtla-2327 3...11 98257124us :
> > bpf_trace_printk: rtla pid 2325 forks into 2330
> > [ 103.229489] rtla-2327 3...11 98257353us :
> > bpf_trace_printk: rtla pid 2325 forks into 2331
> > [ 103.229651] rtla-2327 3...11 98257574us :
> > bpf_trace_printk: rtla pid 2325 forks into 2332
> > [ 103.229810] rtla-2327 3...11 98257801us :
> > bpf_trace_printk: rtla pid 2325 forks into 2333
> > [ 103.229968] rtla-2327 3...11 98258008us :
> > bpf_trace_printk: rtla pid 2325 forks into 2334
> > [ 103.230124] rtla-2327 3...11 98258216us :
> > bpf_trace_printk: rtla pid 2325 forks into 2335
> > [ 103.230293] bash-2094 0d..31 102987914us :
> > bpf_trace_printk: rtla pid 2325 received signal 2
> > [ 103.230452] bash-2094 0d..31 102987915us :
> > bpf_trace_printk: ...from bash pid 2094
> > [ 103.230609] bash-2094 0d..31 102990000us :
> > bpf_trace_printk: rtla pid 2325 received signal 15
> > [ 103.230766] bash-2094 0d..31 102990001us :
> > bpf_trace_printk: ...from bash pid 2094
> > [ 103.230922] rtla-2325 2...21 102990010us :
> > bpf_trace_printk: rtla pid 2325 exiting
> > [ 103.231079] rtla-2327 3...21 102990503us :
> > bpf_trace_printk: rtla pid 2327 exiting
> > [ 103.231242] rtla-2337 3d..31 103232021us :
> > bpf_trace_printk: timerlatu/0 pid 2328 received signal 9
> > [ 103.231399] rtla-2337 3d..31 103232022us :
> > bpf_trace_printk: ...from rtla pid 2337
> > [ 103.231554] rtla-2337 3d..31 103232024us :
> > bpf_trace_printk: timerlatu/1 pid 2329 received signal 9
> > [ 103.231711] rtla-2337 3d..31 103232025us :
> > bpf_trace_printk: ...from rtla pid 2337
> > [ 103.231866] rtla-2337 3d..31 103232029us :
> > bpf_trace_printk: timerlatu/2 pid 2330 received signal 9
> > [ 103.232022] rtla-2337 3d..31 103232030us :
> > bpf_trace_printk: ...from rtla pid 2337
> > [ 103.232178] rtla-2337 3dN.31 103232031us :
> > bpf_trace_printk: timerlatu/3 pid 2331 received signal 9
> > [ 103.232347] rtla-2337 3dN.31 103232031us :
> > bpf_trace_printk: ...from rtla pid 2337
> > [ 103.232506] rtla-2337 3dN.31 103232034us :
> > bpf_trace_printk: timerlatu/4 pid 2332 received signal 9
> > [ 103.232685] rtla-2337 3dN.31 103232034us :
> > bpf_trace_printk: ...from rtla pid 2337
> > [ 103.232846] rtla-2337 3dN.31 103232039us :
> > bpf_trace_printk: timerlatu/5 pid 2333 received signal 9
> > [ 103.233010] rtla-2337 3dN.31 103232039us :
> > bpf_trace_printk: ...from rtla pid 2337
> > [ 103.233171] rtla-2337 3dN.31 103232042us :
> > bpf_trace_printk: timerlatu/6 pid 2334 received signal 9
> > [ 103.233357] rtla-2337 3dN.31 103232042us :
> > bpf_trace_printk: ...from rtla pid 2337
> > [ 103.233517] rtla-2337 3dN.31 103232046us :
> > bpf_trace_printk: timerlatu/7 pid 2335 received signal 9
> > [ 103.233678] rtla-2337 3dN.31 103232046us :
> > bpf_trace_printk: ...from rtla pid 2337
> > [ 103.233835] timerlat-2331 3...21 103232701us :
> > bpf_trace_printk: timerlatu/3 pid 2331 exiting
> > [ 103.233995] timerlat-2329 1...21 103232704us :
> > bpf_trace_printk: timerlatu/1 pid 2329 exiting
> > [ 103.234155] timerlat-2331 3....1 103232705us :
> > timerlat_fd_release: timerlat_fd_release: cpu=3,
> > tlat_var=000000000fba7a6e, osn_var->pid=2331
> > [ 103.234357] timerlat-2329 1....1 103232706us :
> > timerlat_fd_release: timerlat_fd_release: cpu=1,
> > tlat_var=0000000002e262d8, osn_var->pid=2329
> > [ 103.234508] timerlat-2334 6...21 103232821us :
> > bpf_trace_printk: timerlatu/6 pid 2334 exiting
> > [ 103.234657] timerlat-2334 6....1 103232826us :
> > timerlat_fd_release: timerlat_fd_release: cpu=6,
> > tlat_var=000000008e1d5095, osn_var->pid=2334
> > [ 103.234813] timerlat-2330 2...21 103232850us :
> > bpf_trace_printk: timerlatu/2 pid 2330 exiting
> > [ 103.234966] timerlat-2330 2....1 103232853us :
> > timerlat_fd_release: timerlat_fd_release: cpu=2,
> > tlat_var=00000000b9d3a280, osn_var->pid=2330
> > [ 103.235122] timerlat-2333 5...21 103232856us :
> > bpf_trace_printk: timerlatu/5 pid 2333 exiting
> > [ 103.235291] timerlat-2333 5....1 103232858us :
> > timerlat_fd_release: timerlat_fd_release: cpu=5,
> > tlat_var=00000000a02fe267, osn_var->pid=2333
> > [ 103.235452] timerlat-2332 4...21 103232861us :
> > bpf_trace_printk: timerlatu/4 pid 2332 exiting
> > [ 103.235610] timerlat-2332 4....1 103232864us :
> > timerlat_fd_release: timerlat_fd_release: cpu=4,
> > tlat_var=000000008ed0b87f, osn_var->pid=2332
> > [ 103.235769] timerlat-2335 7...21 103232867us :
> > bpf_trace_printk: timerlatu/7 pid 2335 exiting
> > [ 103.235932] timerlat-2335 7....1 103232869us :
> > timerlat_fd_release: timerlat_fd_release: cpu=7,
> > tlat_var=000000003113e297, osn_var->pid=2335
> > [ 103.236103] rtla-2337 3..... 103234109us :
> > osnoise_workload_start: tlat_var_reset: cpu=0,
> > tlat_var=0000000034996efa
> > [ 103.236316] rtla-2337 3..... 103234109us :
> > osnoise_workload_start: tlat_var_reset: cpu=1,
> > tlat_var=0000000062d1f520
> > [ 103.236473] rtla-2337 3..... 103234109us :
> > osnoise_workload_start: tlat_var_reset: cpu=2,
> > tlat_var=0000000002e262d8
> > [ 103.236630] rtla-2337 3..... 103234110us :
> > osnoise_workload_start: tlat_var_reset: cpu=3,
> > tlat_var=00000000b9d3a280
> > [ 103.236797] rtla-2337 3..... 103234110us :
> > osnoise_workload_start: tlat_var_reset: cpu=4,
> > tlat_var=000000000fba7a6e
> > [ 103.236972] rtla-2337 3..... 103234110us :
> > osnoise_workload_start: tlat_var_reset: cpu=5,
> > tlat_var=000000008ed0b87f
> > [ 103.237144] rtla-2337 3..... 103234110us :
> > osnoise_workload_start: tlat_var_reset: cpu=6,
> > tlat_var=00000000a02fe267
> > [ 103.237345] rtla-2337 3..... 103234110us :
> > osnoise_workload_start: tlat_var_reset: cpu=7,
> > tlat_var=000000008e1d5095
> > [ 103.237516] timerlat-2328 0...21 103234149us :
> > bpf_trace_printk: timerlatu/0 pid 2328 exiting
> > [ 103.237690] timerlat-2328 0....1 103234152us :
> > timerlat_fd_release: timerlat_fd_release: cpu=0,
> > tlat_var=0000000062d1f520, osn_var->pid=0
> > [ 103.237871] ---------------------------------
> > [ 103.238053] CR2: 0000000000000010
> >
> > On the kernel side, I'm using similar trace_printk debug prints as you
> > did; the bpftrace program used is this:
> > uretprobe:libc:fork
> > /strncmp(comm, "rtla", 5) == 0 && retval != 0/
> > {
> > debugf("rtla pid %d forks into %d", pid, retval)
> > }
> > tracepoint:signal:signal_generate
> > /strncmp(args->comm, "rtla", 5) == 0 || strncmp(args->comm,
> > "timerlatu", 9) == 0/ {
> > debugf("%s pid %d received signal %d", args->comm, args->pid,
> > args->sig);
> > debugf("...from %s pid %d", comm, pid);
> > }
> > tracepoint:sched:sched_process_exit
> > /strncmp(args->comm, "rtla", 5) == 0 || strncmp(args->comm,
> > "timerlatu", 9) == 0/ {
> > debugf("%s pid %d exiting", args->comm, args->pid)
> > }
> >
> > > Anyway, the start and stop should probably just pause the threads and not
> > > kill them an start them again. That is, the osnoise_workload_start() should
> > > be called by the init callbacks and the osnoise_workload_stop should be
> > > called by reset callback.
> > >
> > > The start and stop callbacks should just pause and restart the the threads.
> > >
> >
> > I'm not sure if that fits into the design of timerlat. At least for
> > timerlat with user workload, each activation of the timerlat tracer
> > implies the creation of a new user workload, not the re-activation of
> > a lingering one (that might be an entirely different program). A
> > timerlat tracer is tightly associated with its threads: if the threads
> > exit, the timer itself ceases to have meaning. Of course also rtla
> > should make sure there are no lingering timerlatu processes.
> >
> > I think waiting on the threads to actually exit in stop_kthread() is
> > the proper solution:
> >
> > /*
> > * stop_kthread - stop a workload thread
> > */
> > static void stop_kthread(unsigned int cpu){
> > struct task_struct *kthread;
> >
> > kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread;
> > if (kthread) {
> > if (test_bit(OSN_WORKLOAD, &osnoise_options)) {
> > kthread_stop(kthread);
> > } else {
> > /*
> > * This is a user thread waiting on the timerlat_fd. We need
> > * to close all users, and the best way to guarantee this is
> > * by killing the thread. NOTE: this is a purpose specific file.
> > */
> > kill_pid(kthread->thread_pid, SIGKILL, 1);
> > /* ^^ here wait until kthread (actually the user workload) exits */
> > put_task_struct(kthread);
> > }
> > per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL;
> > }
> >
> > There might even be another bug, not sure if calling put_task_struct
> > on a task that is still exiting is sane.
> >
> > Tomas
> >
> ---end quoted text---
---end quoted text---
Powered by blists - more mailing lists