linux-kernel - Re: [PATCH v3 1/2] tracing: ring_buffer: Rewind persistent ring buffer when reboot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250523165425.0275c9a1@gandalf.local.home>
Date: Fri, 23 May 2025 16:54:25 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/2] tracing: ring_buffer: Rewind persistent ring
 buffer when reboot

On Fri, 23 May 2025 00:54:44 +0900
"Masami Hiramatsu (Google)" <mhiramat@...nel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@...nel.org>

Unfortunately, I don't think this is going to make the merge window.

This failed one of my tests and it took me a while to figure it out. And I
think there may be some more subtle issues that we need further testing on.

> 
> Rewind persistent ring buffer pages which have been read in the
> previous boot. Those pages are highly possible to be lost before
> writing it to the disk if the previous kernel crashed. In this
> case, the trace data is kept on the persistent ring buffer, but
> it can not be read because its commit size has been reset after
> read.
> This skips clearing the commit size of each sub-buffer and
> recover it after reboot.
> 
> Note: If you read the previous boot data via trace_pipe, that
> is not accessible in that time. But reboot without clearing (or
> reusing) the read data, the read data is recovered again in the
> next boot.
> Thus, when you read the previous boot data, clear it by
> `echo > trace`.
> 
> @@ -5348,7 +5446,6 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
>  	 */
>  	local_set(&cpu_buffer->reader_page->write, 0);
>  	local_set(&cpu_buffer->reader_page->entries, 0);
> -	local_set(&cpu_buffer->reader_page->page->commit, 0);

We need to make sure the above doesn't cause any issues. It wasn't the
reason for my test failure, but it's something we need to look more closely
at.


>  	cpu_buffer->reader_page->real_end = 0;
>  
>   spin:
> @@ -6642,7 +6739,6 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
>  		cpu_buffer->read_bytes += rb_page_size(reader);
>  
>  		/* swap the pages */
> -		rb_init_page(bpage);

This isn't needed, because the persistent ring buffer is never swapped. And
this is what caused my test to fail. For some reason (and I'm still trying
to figure out exactly why), it causes my stress test that runs:

  perf record -o perf-test.dat -a -- trace-cmd record -e all -p function /work/c/hackbench 10 || exit -1

To never end after tracing is stopped, and it fills up all the disk space.

But again, this part isn't needed because the persistent ring buffer
doesn't do the swapping. This replaces what the user passed in with the
current page.

>  		bpage = reader->page;
>  		reader->page = data_page->data;
>  		local_set(&reader->write, 0);

I think we also need this:

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7d837963dd1e..456efebc396a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3638,6 +3638,9 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
 
 	rb_reset_tail(cpu_buffer, tail, info);
 
+	/* The new page should have zero committed */
+	rb_init_page(next_page->page);
+
 	/* Commit what we have for now. */
 	rb_end_commit(cpu_buffer);
 	/* rb_end_commit() decs committing */

-- Steve