linux-kernel - Re: [PATCH] ring-buffer: Do not trigger WARN_ON() due to a commit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250529000342.5218ac7b90c99f3636edd5ab@kernel.org>
Date: Thu, 29 May 2025 00:03:42 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Linux Trace Kernel
 <linux-trace-kernel@...r.kernel.org>, Mathieu Desnoyers
 <mathieu.desnoyers@...icios.com>, Vincent Donnefort <vdonnefort@...gle.com>
Subject: Re: [PATCH] ring-buffer: Do not trigger WARN_ON() due to a
 commit_overrun

On Tue, 27 May 2025 22:17:35 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:

> On Wed, 28 May 2025 10:42:03 +0900
> Masami Hiramatsu (Google) <mhiramat@...nel.org> wrote:
> 
> > > The way to differentiate this case from the normal case of there
> > > only being one page written to where the swap of the reader page
> > > received that one page (which is the commit page), check if the
> > > tail page is on the reader page. The difference between the commit
> > > page and the tail page is that the tail page is where new writes go
> > > to, and the commit page holds the first write that hasn't been
> > > committed yet. In the case of an interrupt preempting the write of
> > > an event and filling the buffer, it would move the tail page but
> > > not the commit page.  
> > 
> > (BTW, what happen if the interrupted process commits the event? That
> >  event will be lost, or commit and just move commit_page?)
> 
> No, the first event to be created is the "commit" event. If it gets
> interrupted and the interrupt adds a bunch of events that wraps the
> ring buffer, it can't touch the commit event, it will just start
> dropping events. Then when the commit event finishes, it can either be
> read by the reader, or overwritten by the next events coming in.

Hmm, rb_end_commit() seems to try commit all nested events (committing > 1)
if committing == 1, and the rb_set_commit_to_write() pushes commit pointer
to the latest write pointer. So when it "start dropping events"? (is that
write side?)

> > 
> > Thus the reader_page == commit_page && reader_page == tail_page,
> > missed_events should be 0?
> > 
> > Possible cases if missed_events != 0:
> > 
> >  - reader_page != commit_page
> > 	(writer's commit overtook the reader)
> 
> The reader is never in the write buffer. Just the head page will move.
> When a new reader page is taken it will swap out the old reader page
> with the head page. If the head page is the commit page, then the
> commit page becomes the reader page too.

Hmm, so this is the case that the writer is too fast and it moves
head page to the next page. Thus some events on old head page are
lost.

> 
> > 
> >  - reader_page == commit_page but reader_page != tail_page 
> > 	(writer overtook the reader, but commit is not completed yet.)
> 
> No, "writer overtook the reader" doesn't make sense as the reader is
> not on the write buffer, so the writer can not catch up to it. What the
> write buffer has is the "head" page which is where the next reader will
> come to.
> 
> The only way reader_page == commit_page and reader_page != tail_page is
> if the commit was interrupted and the interrupt added events and moved
> forward off the commit_page. The only way there would be missed events
> in that case is if the interrupt added so many events it wrapped the
> buffer and then started dropping events.

OK, so "wrapped the buffer" means "tail (write) page caught up the commit"?

> 
> > 
> > if 
> >  - reader_page == commit_page == tail_page
> > in this case, missed_events should be 0.
> > 
> > Since the reader_page is out of the ring buffer, writer should not
> > use reader_page while reading the same reader_page, is that right?
> 
> Correct. But the writer could end up on the reader page after the swap,
> if the head page happened to be the commit page.

OK.

> 
> > 
> 
> 
> > > cpu_buffer->tail_page,
> > > +				       "Reader on commit with %ld
> > > missed events",
> > > +				       missed_events)) {
> > > +				/*
> > > +				 * If the tail page is not on the
> > > reader page but
> > > +				 * the commit_page is, that would
> > > mean that there's
> > > +				 * a commit_overrun (an interrupt
> > > preempted an
> > > +				 * addition of an event and then
> > > filled the buffer
> > > +				 * with new events). In this case
> > > it's not an
> > > +				 * error, but it should still be
> > > reported.
> > > +				 */
> > > +				pr_info("Ring buffer commit
> > > overrun lost %ld events at timestamp:%lld\n",
> > > +					missed_events,
> > > cpu_buffer->reader_page->page->time_stamp);  
> > 
> > Do we need this pr_info() for each commit overrun?
> 
> Yes. When doing this stress test, it printed at most 4 times. It
> happens once per time the interrupt fills the buffer while interrupting
> the buffer.

Ah, I meant we should report it in printk buffer, since the event
lost will happen in some case and reader can notice that.

> 
> I seldom ever get commit overruns. It's one of the fields in the status
> file located in: /sys/kernel/tracing/per_cpu/cpu*/stats
> 
> > 
> > > +			}
> > > +		}
> > >  	}  
> > 
> > Just for cleanup the code idea, with above change, this code is
> > something like;
> > 
> > ----------------
> > 
> > missed_events = cpu_buffer->lost_events;
> > 
> > if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
> > 	if (missed_event) {
> > 
> > 	}
> > } else {
> > 	if (missed_event) {
> > 		if (!WARN_ONCE(cpu_buffer->reader_page ==
> > cpu_buffer->tail_page,"...")) { pr_info("...")
> > 		}
> > 	}
> > }
> > 
> > ----------------
> > 
> > Can we make it as below?
> > 
> > ----------------
> > missed_events = cpu_buffer->lost_events;
> > 
> > if (missed_event) {
> > 	if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
> > 
> > 	} else if (!WARN_ONCE(cpu_buffer->reader_page ==
> > cpu_buffer->tail_page, "...") { /**/
> > 		pr_info("..."); 
> > 	}
> > }
> 
> Hmm, OK, I'll look at that.


Thanks! Anyway, the logic looks good to me.


> 
> Thanks,
> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@...nel.org>