[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20250529110355.98fecf6d98765e794e0ba345@kernel.org>
Date: Thu, 29 May 2025 11:03:55 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Linux Trace Kernel
<linux-trace-kernel@...r.kernel.org>, Masami Hiramatsu
<mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Vincent Donnefort <vdonnefort@...gle.com>
Subject: Re: [PATCH v2] ring-buffer: Do not trigger WARN_ON() due to a
commit_overrun
On Wed, 28 May 2025 12:15:55 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:
> From: Steven Rostedt <rostedt@...dmis.org>
>
> When reading a memory mapped buffer the reader page is just swapped out
> with the last page written in the write buffer. If the reader page is the
> same as the commit buffer (the buffer that is currently being written to)
> it was assumed that it should never have missed events. If it does, it
> triggers a WARN_ON_ONCE().
>
> But there just happens to be one scenario where this can legitimately
> happen. That is on a commit_overrun. A commit overrun is when an interrupt
> preempts an event being written to the buffer and then the interrupt adds
> so many new events that it fills and wraps the buffer back to the commit.
> Any new events would then be dropped and be reported as "missed_events".
>
> In this case, the next page to read is the commit buffer and after the
> swap of the reader page, the reader page will be the commit buffer, but
> this time there will be missed events and this triggers the following
> warning:
>
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 1127 at kernel/trace/ring_buffer.c:7357 ring_buffer_map_get_reader+0x49a/0x780
> Modules linked in: kvm_intel kvm irqbypass
> CPU: 2 UID: 0 PID: 1127 Comm: trace-cmd Not tainted 6.15.0-rc7-test-00004-g478bc2824b45-dirty #564 PREEMPT
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> RIP: 0010:ring_buffer_map_get_reader+0x49a/0x780
> Code: 00 00 00 48 89 fe 48 c1 ee 03 80 3c 2e 00 0f 85 ec 01 00 00 4d 3b a6 a8 00 00 00 0f 85 8a fd ff ff 48 85 c0 0f 84 55 fe ff ff <0f> 0b e9 4e fe ff ff be 08 00 00 00 4c 89 54 24 58 48 89 54 24 50
> RSP: 0018:ffff888121787dc0 EFLAGS: 00010002
> RAX: 00000000000006a2 RBX: ffff888100062800 RCX: ffffffff8190cb49
> RDX: ffff888126934c00 RSI: 1ffff11020200a15 RDI: ffff8881010050a8
> RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffed1024d26982
> R10: ffff888126934c17 R11: ffff8881010050a8 R12: ffff888126934c00
> R13: ffff8881010050b8 R14: ffff888101005000 R15: ffff888126930008
> FS: 00007f95c8cd7540(0000) GS:ffff8882b576e000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f95c8de4dc0 CR3: 0000000128452002 CR4: 0000000000172ef0
> Call Trace:
> <TASK>
> ? __pfx_ring_buffer_map_get_reader+0x10/0x10
> tracing_buffers_ioctl+0x283/0x370
> __x64_sys_ioctl+0x134/0x190
> do_syscall_64+0x79/0x1c0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f95c8de48db
> Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> RSP: 002b:00007ffe037ba110 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007ffe037bb2b0 RCX: 00007f95c8de48db
> RDX: 0000000000000000 RSI: 0000000000005220 RDI: 0000000000000006
> RBP: 00007ffe037ba180 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffe037bb6f8 R14: 00007f95c9065000 R15: 00005575c7492c90
> </TASK>
> irq event stamp: 5080
> hardirqs last enabled at (5079): [<ffffffff83e0adb0>] _raw_spin_unlock_irqrestore+0x50/0x70
> hardirqs last disabled at (5080): [<ffffffff83e0aa83>] _raw_spin_lock_irqsave+0x63/0x70
> softirqs last enabled at (4182): [<ffffffff81516122>] handle_softirqs+0x552/0x710
> softirqs last disabled at (4159): [<ffffffff815163f7>] __irq_exit_rcu+0x107/0x210
> ---[ end trace 0000000000000000 ]---
>
> The above was triggered by running on a kernel with both lockdep and KASAN
> as well as kmemleak enabled and executing the following command:
>
> # perf record -o perf-test.dat -a -- trace-cmd record --nosplice -e all -p function hackbench 50
>
> With perf interjecting a lot of interrupts and trace-cmd enabling all
> events as well as function tracing, with lockdep, KASAN and kmemleak
> enabled, it could cause an interrupt preempting an event being written to
> add enough events to wrap the buffer. trace-cmd was modified to have
> --nosplice use mmap instead of reading the buffer.
>
> The way to differentiate this case from the normal case of there only
> being one page written to where the swap of the reader page received that
> one page (which is the commit page), check if the tail page is on the
> reader page. The difference between the commit page and the tail page is
> that the tail page is where new writes go to, and the commit page holds
> the first write that hasn't been committed yet. In the case of an
> interrupt preempting the write of an event and filling the buffer, it
> would move the tail page but not the commit page.
>
> Have the warning only trigger if the tail page is also on the reader page,
> and also print out the number of events dropped by a commit overrun as
> that can not yet be safely added to the page so that the reader can see
> there were events dropped.
>
> Cc: stable@...r.kernel.org
> Fixes: fe832be05a8ee ("ring-buffer: Have mmapped ring buffer keep track of missed events")
> Signed-off-by: Steven Rostedt (Google) <rostedt@...dmis.org>
This looks good to me.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>
Thanks!
> ---
> Changes since v1: https://lore.kernel.org/20250527121140.0e7f0565@gandalf.local.home
>
> - Added to the pr_info() the CPU that overflowed and the timestamp of the
> page that overflowed, to make it easier for user space to know where it
> happened.
>
> - Restructured to have if (missed_events) be the main condition, as the sub
> conditions only did something when missed_events was non-zero.
> (Masami Hiramatsu)
>
> kernel/trace/ring_buffer.c | 26 ++++++++++++++++++--------
> 1 file changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index ca1a8e706004..683aa57870fe 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -7285,8 +7285,8 @@ int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
> /* Check if any events were dropped */
> missed_events = cpu_buffer->lost_events;
>
> - if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
> - if (missed_events) {
> + if (missed_events) {
> + if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
> struct buffer_data_page *bpage = reader->page;
> unsigned int commit;
> /*
> @@ -7307,13 +7307,23 @@ int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
> local_add(RB_MISSED_STORED, &bpage->commit);
> }
> local_add(RB_MISSED_EVENTS, &bpage->commit);
> + } else if (!WARN_ONCE(cpu_buffer->reader_page == cpu_buffer->tail_page,
> + "Reader on commit with %ld missed events",
> + missed_events)) {
> + /*
> + * There shouldn't be any missed events if the tail_page
> + * is on the reader page. But if the tail page is not on the
> + * reader page and the commit_page is, that would mean that
> + * there's a commit_overrun (an interrupt preempted an
> + * addition of an event and then filled the buffer
> + * with new events). In this case it's not an
> + * error, but it should still be reported.
> + *
> + * TODO: Add missed events to the page for user space to know.
> + */
> + pr_info("Ring buffer [%d] commit overrun lost %ld events at timestamp:%lld\n",
> + cpu, missed_events, cpu_buffer->reader_page->page->time_stamp);
> }
> - } else {
> - /*
> - * There really shouldn't be any missed events if the commit
> - * is on the reader page.
> - */
> - WARN_ON_ONCE(missed_events);
> }
>
> cpu_buffer->lost_events = 0;
> --
> 2.47.2
>
--
Masami Hiramatsu (Google) <mhiramat@...nel.org>
Powered by blists - more mailing lists