lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250523165425.0275c9a1@gandalf.local.home>
Date: Fri, 23 May 2025 16:54:25 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/2] tracing: ring_buffer: Rewind persistent ring
 buffer when reboot

On Fri, 23 May 2025 00:54:44 +0900
"Masami Hiramatsu (Google)" <mhiramat@...nel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@...nel.org>

Unfortunately, I don't think this is going to make the merge window.

This failed one of my tests and it took me a while to figure it out. And I
think there may be some more subtle issues that we need further testing on.

> 
> Rewind persistent ring buffer pages which have been read in the
> previous boot. Those pages are highly possible to be lost before
> writing it to the disk if the previous kernel crashed. In this
> case, the trace data is kept on the persistent ring buffer, but
> it can not be read because its commit size has been reset after
> read.
> This skips clearing the commit size of each sub-buffer and
> recover it after reboot.
> 
> Note: If you read the previous boot data via trace_pipe, that
> is not accessible in that time. But reboot without clearing (or
> reusing) the read data, the read data is recovered again in the
> next boot.
> Thus, when you read the previous boot data, clear it by
> `echo > trace`.
> 
> @@ -5348,7 +5446,6 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	 */
>  	local_set(&cpu_buffer->reader_page->write, 0);
>  	local_set(&cpu_buffer->reader_page->entries, 0);
> -	local_set(&cpu_buffer->reader_page->page->commit, 0);

We need to make sure the above doesn't cause any issues. It wasn't the
reason for my test failure, but it's something we need to look more closely
at.


>  	cpu_buffer->reader_page->real_end = 0;
>  
>   spin:
> @@ -6642,7 +6739,6 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  		cpu_buffer->read_bytes += rb_page_size(reader);
>  
>  		/* swap the pages */
> -		rb_init_page(bpage);

This isn't needed, because the persistent ring buffer is never swapped. And
this is what caused my test to fail. For some reason (and I'm still trying
to figure out exactly why), it causes my stress test that runs:

  perf record -o perf-test.dat -a -- trace-cmd record -e all -p function /work/c/hackbench 10 || exit -1

To never end after tracing is stopped, and it fills up all the disk space.

But again, this part isn't needed because the persistent ring buffer
doesn't do the swapping. This replaces what the user passed in with the
current page.

>  		bpage = reader->page;
>  		reader->page = data_page->data;
>  		local_set(&reader->write, 0);

I think we also need this:

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7d837963dd1e..456efebc396a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3638,6 +3638,9 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
 
 	rb_reset_tail(cpu_buffer, tail, info);
 
+	/* The new page should have zero committed */
+	rb_init_page(next_page->page);
+
 	/* Commit what we have for now. */
 	rb_end_commit(cpu_buffer);
 	/* rb_end_commit() decs committing */

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ