lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZscfFFUM6pBuwpGZ@uudg.org>
Date: Thu, 22 Aug 2024 08:20:52 -0300
From: "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: tglozar@...hat.com, linux-trace-kernel@...r.kernel.org,
	linux-kernel@...r.kernel.org, jkacur@...hat.com
Subject: Re: [PATCH] tracing/timerlat: Check tlat_var for NULL in
 timerlat_fd_release

On Wed, Aug 21, 2024 at 04:03:16PM -0400, Steven Rostedt wrote:
> On Tue, 20 Aug 2024 15:00:01 +0200
> tglozar@...hat.com wrote:
> 
> > From: Tomas Glozar <tglozar@...hat.com>
> > 
> > When running timerlat with a userspace workload (NO_OSNOISE_WORKLOAD),
> > NULL pointer dereference can be triggered by sending consequent SIGINT
> > and SIGTERM signals to the workload process. That then causes
> > timerlat_fd_release to be called twice in a row, and the second time,
> > hrtimer_cancel is called on a zeroed hrtimer struct, causing the NULL
> > dereference.
> > 
> > This can be reproduced using rtla:
> > ```
> > $ while true; do rtla timerlat top -u -q & PID=$!; sleep 5; \
> >  kill -INT $PID; sleep 0.001; kill -TERM $PID; wait $PID; done
> > [1] 1675
> > [1]+  Aborted (SIGTERM)      rtla timerlat top -u -q
> > [1] 1688
> > client_loop: send disconnect: Broken pipe
> > ```
> > triggering the bug:
> 
> I'm able to reproduce this with the above. Unfortunately, I can still
> reproduce it after applying this patch :-(

We were able to mitigate the problem (triggered by that command line) simply
by handling SIGTERM in the userspace tool the same way it handles SIGINT. That
was one of the factors that helped the "closing the file descriptor twice"
theory.

Would you mind sharing the backtrace you got? That would also help us
investigating.

> Looking at the code, the logic for handling the kthread seems off. I'll
> spend a little time to see if I can figure it out.

You mean the 

+	if (!tlat_var->kthread) {
+		/* the fd has been closed already */

bit or the kthread handling in rtla itself?

As Tomas already said, thank you for testing and reviewing the suggested fix!

Luis


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ