[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1229989368.30177.87.camel@localhost.localdomain>
Date: Mon, 22 Dec 2008 18:42:48 -0500
From: Steven Rostedt <srostedt@...hat.com>
To: Jiaying Zhang <jiayingz@...gle.com>
Cc: linux-kernel@...r.kernel.org, Michael Rubin <mrubin@...gle.com>,
Michael Davidson <md@...gle.com>,
Martin Bligh <mbligh@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: races when reserving an event in the unified trace buffer
Hi Jiaying,
Again, sorry for taking so long to get back to you on this.
On Wed, 2008-12-17 at 16:00 -0800, Jiaying Zhang wrote:
> Hi Steve,
>
> I mentioned in my last email that I saw the warning about the trace buffer
> became full because of an interrupt storm even with my posted patch applied.
> After adding more debugging messages, I found that the problem was actually
> caused by another race in the code.
Actually, the other email did find a bug.
> I saw the events in the buffer were
> not just interrupts but include other kernel events as well. Looks
> like the commit
> page failed to advance because of a race between the update of tail_page in
> __rb_reserve_next_event and the following lines in the rb_set_commit_to_write:
> while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
> cpu_buffer->commit_page->page->commit =
> cpu_buffer->commit_page->write;
> rb_inc_page(cpu_buffer, &cpu_buffer->commit_page);
> ...
> }
You are saying an interrupt happens here that pushes the tail page one.
> while (rb_commit_index(cpu_buffer) !=
> rb_page_write(cpu_buffer->commit_page)) {
> cpu_buffer->commit_page->page->commit =
> cpu_buffer->commit_page->write;
> barrier();
> }
Now here we are left with the tail page moved one forward but the commit
left back one, and thus we do not increment the commit.
>
> The problem is that an interrupt can happen right after a kernel event finishes
> the condition check "cpu_buffer->commit_page != cpu_buffer->tail_page" but
> before it updates the commit value of the commit_page. If we were at the tail
> of the tail_page and the commit_page was the same as the tail_page when
> the first kernel event checked that, neither of the events would advance the
> commit_page pointer because the interrupt event was NOT the commit event
> before the kernel event updated the commit pointer and the kernel event did
> NOT see the change of the tail_page made by the interrupt event. Once we
> got into this situation, the trace buffer would soon become full and reject
> any further reservation requests.
>
> A possible fix I think is to update the commit pointer of the commit_page
> both before and after updating the commit_page in rb_set_commit_to_write().
> Here is the proposed fix. Please let me know if my analysis makes sense
> to you. Thanks a lot!
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 7f69cfe..b345ba7 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -839,6 +839,12 @@ rb_set_commit_to_write(struct ring_buffer_per_cpu
> *cpu_buffer)
> * back to us). This allows us to do a simple loop to
> * assign the commit to the tail.
> */
> + while (rb_commit_index(cpu_buffer) !=
> + rb_page_write(cpu_buffer->commit_page)) {
> + cpu_buffer->commit_page->page->commit =
> + cpu_buffer->commit_page->write;
> + barrier();
> + }
I do not think this solves it for us. Because the interrupt comes in
after the next check, and this is where we are left with the dangling
commit.
> while (cpu_buffer->commit_page != cpu_buffer->tail_page) {
> cpu_buffer->commit_page->page->commit =
> cpu_buffer->commit_page->write;
What we need is to add do while over the entire code with:
do {
[...]
} while (cpu_buffer->commit_page != cpu_buffer->tail_page);
Thanks!
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists