linux-kernel - Re: [PATCH] tracing - fix function graph trace to properly display failed entries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1259166080.21397.28.camel@gandalf.stny.rr.com>
Date:	Wed, 25 Nov 2009 11:21:20 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Jiri Olsa <jolsa@...hat.com>
Cc:	Frederic Weisbecker <fweisbec@...il.com>, mingo@...e.hu,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] tracing - fix function graph trace to properly display
 failed entries

On Wed, 2009-11-25 at 09:45 +0100, Jiri Olsa wrote:
> On Tue, Nov 24, 2009 at 10:14:36PM +0100, Frederic Weisbecker wrote:

> > 
> > Hmm, yeah in trace_pipe case we repeat until we have no space
> > left in trace_seq (TRACE_TYPE_PARTIAL_LINE), or until we
> > the user buffer is filled.
> > 
> > But indeed in case of trace file, we are using a seq file
> > so the buffer gets flushed after each entries.
> > 
> > Assuming the trace_seq is 4096 bytes long this is probably
> > enough for every function graph entries (even with headers and
> > interrupts), this is not something we are supposed to see in trace_seq.
> > 
> > I did not realized that when Jiri sent the first version of this patch.
> 
> sure, but it is not the trace_seq structure that overflows, it is the bare 
> seq_file underneath. The s_show calls "trace_print_seq":
> 
> void trace_print_seq(struct seq_file *m, struct trace_seq *s)
> {               
>         int len = s->len >= PAGE_SIZE ? PAGE_SIZE - 1 : s->len;
>         
>         seq_write(m, s->buffer, len);
> 
>         trace_seq_init(s);
> }               
> 
> it flushes the trace_seq to the seq_file and reinits the trace_seq.
> But the seq_file will get filled in the end, ending up with
> not displayed output in that case I described in the first email.
> 

First note, the filling of the trace_seq is only an issue with
trace_pipe. The trace file never has this issue because it uses the
trace_seq once per event, and there exists no event that is bigger than
a page.

The trace_pipe fills as much as it can because it is doing it live and
wants to be as fast as possible, and it is passing information between
user and kernel. Thus it will try to pass as much to userspace as
possible.

Now any issue with the trace file is not caused by trace_seq. But lets
look at what is happening with trace_pipe.

In tracing_read_pipe, we do the following:

	len = iter->seq.len;

	ret = print_trace_line(iter);
	if (ret == TRACE_TYPE_PARTIAL_LINE) {
		iter->seq.len = len;  /* this removes any partial write */
		break;
	}

	if (ret != TRACE_TYPE_NO_CONSUME)
		trace_consume(iter);

	if (iter->seq.len >= cnt)
		break;

The above is contained in a loop. It stops when either there are no more
events, the trace_seq buffer is full or we fill the users buffer.

The print_trace_line is what eventually goes into the function graph
tracer calling print_graph_function.

The call get_return_for_leaf figures out if this is a leaf function or
not.

	ring_iter = iter->buffer_iter[iter->cpu];

	if (ring_iter)
		event = ring_buffer_iter_peak(ring_iter, NULL);
	else {
		ring_buffer_consume(iter->tr->buffer, iter->cpu, NULL);
		event = ring_buffer_peak(iter->tr->buffer, iter->cpu,
					NULL);
	}

ring_iter is NULL on trace_pipe, so we consume the event. If the next
item is not a graph item, it will be consumed in the loop described
above and lost.

In the case of print_graph_entry_nested, where we don't want to discard
the consumed event, on success it returns: TRACE_TYPE_NO_CONSUME, so
that the event does not get consumed.

But if the buffer fills up, and we exit with TRACE_TYPE_PARTIAL_LINE, we
lost the current event :-(  That is, it was consumed and not read. The
next read by userspace that calls the find_next_entry_inc() will not
have the correct entry.

Looking at this deeper, I'm not sure the find_next_entry_inc is correct
here. If we break out of the loop by filling up the seq buffer, we lose
that event all together. I'll look deeper into this.

Thanks,

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/