[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250523192053.47054e6e@gandalf.local.home>
Date: Fri, 23 May 2025 19:20:53 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/2] tracing: ring_buffer: Rewind persistent ring
buffer when reboot
On Fri, 23 May 2025 16:54:25 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:
> > spin:
> > @@ -6642,7 +6739,6 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
> > cpu_buffer->read_bytes += rb_page_size(reader);
> >
> > /* swap the pages */
> > - rb_init_page(bpage);
>
> This isn't needed, because the persistent ring buffer is never swapped. And
> this is what caused my test to fail. For some reason (and I'm still trying
> to figure out exactly why), it causes my stress test that runs:
>
> perf record -o perf-test.dat -a -- trace-cmd record -e all -p function /work/c/hackbench 10 || exit -1
>
> To never end after tracing is stopped, and it fills up all the disk space.
>
> But again, this part isn't needed because the persistent ring buffer
> doesn't do the swapping. This replaces what the user passed in with the
> current page.
>
> > bpage = reader->page;
> > reader->page = data_page->data;
> > local_set(&reader->write, 0);
So I analyzed why this fails and we need to reset the commit here.
Adding a bunch of trace_printk() (and using the new trace_printk_dest
option where I can have trace_printk go into an instance) I was able to see
why this was an issue.
This part of the code swaps the reader page with what was passed in by the
caller. The page doesn't go back into the write part of the ring buffer.
The "commit" field is used to know if there's more data or not.
By not resetting the "commit" field, we have:
reader = rb_get_reader_page(cpu_buffer)
if (cpu_buffer->reader_page->read < rb_page_size(reader))
return reader;
// swap passed in page with reader
Without resetting "commit", the caller consumes the page and then uses that
same page to pass back to this function. Since the "commit" field is never
zero'd, it the above if statement always returns true! And this function
just keeps swapping the reader each time and goes into an infinite loop
(this loop requires user space to do a read or splice on this page so it's
not a kernel infinite loop).
Now the question is, can this affect the persistent ring buffer too? I'll
memory map the buffer and see if it causes the same issue.
-- Steve
Powered by blists - more mailing lists