lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP4=nvRfkZ2CEbFv+MFBXikZ_p2K-1uucgkdgp27JeTxe58qhw@mail.gmail.com>
Date: Tue, 3 Sep 2024 14:47:42 +0200
From: Tomas Glozar <tglozar@...hat.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	jkacur@...hat.com, "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>
Subject: Re: [PATCH] tracing/timerlat: Check tlat_var for NULL in timerlat_fd_release

Ășt 27. 8. 2024 v 16:34 odesĂ­latel Tomas Glozar <tglozar@...hat.com> napsal:
>
> I think waiting on the threads to actually exit in stop_kthread() is
> the proper solution:
>
> /*
>  * stop_kthread - stop a workload thread
>  */
> static void stop_kthread(unsigned int cpu){
>     struct task_struct *kthread;
>
>     kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread;
>     if (kthread) {
>         if (test_bit(OSN_WORKLOAD, &osnoise_options)) {
>             kthread_stop(kthread);
>         } else {
>             /*
>              * This is a user thread waiting on the timerlat_fd. We need
>              * to close all users, and the best way to guarantee this is
>              * by killing the thread. NOTE: this is a purpose specific file.
>              */
>             kill_pid(kthread->thread_pid, SIGKILL, 1);
>             /* ^^ here wait until kthread (actually the user workload) exits */
>             put_task_struct(kthread);
>         }
>         per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL;
>     }
>

I tried a patch that waits on the user workload to exit when doing
osnoise_workload_stop, similarly to what we already do for the kernel
workload. I added a wait_on_completion in osnoise_workload_stop that
is completed in timerlat_fd_release, mimicking what kthread_stop()
does. But that deadlocks if the user workload process attempts to free
a file descriptor taking trace_types_lock before the timerlat one,
since the caller of osnoise_workload_stop() is also holding
trace_types_lock. There seems to be no way to prevent this deadlock,
since the user workload, unlike the kthread which has a fixed program,
is free to open anything it wants.

I suggest using the workaround suggested in
https://lore.kernel.org/linux-trace-kernel/20240823125426.404f2705@gandalf.local.home
for the time being. Together with the patch that adds locks around
stopping the threads
(https://patchwork.kernel.org/project/linux-trace-kernel/patch/20240823102816.5e55753b@gandalf.local.home/),
this should prevent the kernel panic at least until we have a solution
for the race itself.

Tomas




Tomas


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ