linux-kernel - races when reserving an event in the unified trace buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <5df78e1d0812121626k367043c6hbb0232dc20b1db78@mail.gmail.com>
Date:	Fri, 12 Dec 2008 16:26:56 -0800
From:	Jiaying Zhang <jiayingz@...gle.com>
To:	Steven Rostedt <srostedt@...hat.com>
Cc:	linux-kernel@...r.kernel.org, Michael Rubin <mrubin@...gle.com>,
	Michael Davidson <md@...gle.com>,
	Martin Bligh <mbligh@...gle.com>
Subject: races when reserving an event in the unified trace buffer

Hi Steve,

I am doing some load testing with our kernel tracing prototype
that uses the unified trace buffer for managing its data. I sometimes
saw kernel stack dump caused by the following checking in
function  __rb_reserve_next:
        if (unlikely(next_page == cpu_buffer->commit_page)) {
                 WARN_ON_ONCE(1);
                 goto out_unlock;
       }
The comments above the code say the problem is caused by
"an interrupt storm that made it all the way around the buffer".
But I think there is race here that a single interrupt can cause
the check to fail. Suppose this is what happens:
An event is traced and calls __rb_reserve_next. Right after it
gets the current tail_page (line tail_page = cpu_buffer->tail_page;),
an interrupt happens that is also traced. The interrupt also takes
the same tail_page. The interrupt event moves the tail_page
forward if the tail_page is full. Note that the interrupt event gets
the old 'write' value because the first event has not updated that yet.
So the interrupt event may also update the commit_page if it is
the same as the tail_page. As a result, the above check would
fail after the interrupt finishes and the first event resumes its execution.

I have seen the problem happens frequently under heavy loads
on a multi-core machine. Interestingly, I also saw the above
warning that might actually be caused by an interrupt storm.
I was using 64k buffer size and am not sure whether it is possible
for so many interrupts to happen in a short time window.

I think we can use the time_stamp to distinguish the two cases.
Also, in either case, it seems bad to leave the tail_page->write with
an invalid value because it can cause problem when a reader
reads the page. Here is my proposed fix for the problem:

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7f69cfe..1500f78 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -982,8 +982,11 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
                 * it all the way around the buffer, bail, and warn
                 * about it.
                 */
-               if (unlikely(next_page == cpu_buffer->commit_page)) {
+               if (unlikely(next_page == cpu_buffer->commit_page) &&
+                               tail_page->time_stamp > next_page->time_stamp) {
                        WARN_ON_ONCE(1);
+                       if (tail <= BUF_PAGE_SIZE)
+                               local_set(&tail_page->write, tail);
                        goto out_unlock;
                }


Jiaying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/