linux-kernel - Re: [PATCH 2/3] ring-buffer: try to discard unneeded timestamps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4A26E358.4030508@am.sony.com>
Date:	Wed, 3 Jun 2009 13:55:52 -0700
From:	Tim Bird <tim.bird@...sony.com>
To:	Steven Rostedt <rostedt@...dmis.org>
CC:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: [PATCH 2/3] ring-buffer: try to discard unneeded timestamps

Steve,

See my last comment for an important fix.

Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@...hat.com>
> 
> There are times that a race may happen that we add a timestamp in a
> nested write. This timestamp would just contain a zero delta and serves
> no purpose.
> 
> Now that we have a way to discard events, this patch will try to discard
> the timestamp instead of just wasting the space in the ring buffer.
> 
> Signed-off-by: Steven Rostedt <rostedt@...dmis.org>
> ---
>  kernel/trace/ring_buffer.c |   67 +++++++++++++++++++++++++++-----------------
>  1 files changed, 41 insertions(+), 26 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 9453023..5092660 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1335,6 +1335,38 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
>  	return event;
>  }
>  
> +static inline int
> +rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
> +		  struct ring_buffer_event *event)

It might be worth using the terminology 'rb_try_to_free' (or something else)
to distinguish the case of backing up the write pointer from the
case of marking an event as discarded (pad) in the log.

> +{
> +	unsigned long new_index, old_index;
> +	struct buffer_page *bpage;
> +	unsigned long index;
> +	unsigned long addr;
> +
> +	new_index = rb_event_index(event);
> +	old_index = new_index + rb_event_length(event);
> +	addr = (unsigned long)event;
> +	addr &= PAGE_MASK;
> +
> +	bpage = cpu_buffer->tail_page;
> +
> +	if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
> +		/*
> +		 * This is on the tail page. It is possible that
> +		 * a write could come in and move the tail page
> +		 * and write to the next page. That is fine
> +		 * because we just shorten what is on this page.
> +		 */
> +		index = local_cmpxchg(&bpage->write, old_index, new_index);
> +		if (index == old_index)
> +			return 1;
> +	}
> +
> +	/* could not discard */
> +	return 0;
> +}
> +
>  static int
>  rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
>  		  u64 *ts, u64 *delta)
> @@ -1384,10 +1416,13 @@ rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
>  		/* let the caller know this was the commit */
>  		ret = 1;
>  	} else {
> -		/* Darn, this is just wasted space */
> -		event->time_delta = 0;
> -		event->array[0] = 0;
> -		ret = 0;
> +		/* Try to discard the event */
> +		if (!rb_try_to_discard(cpu_buffer, event)) {
> +			/* Darn, this is just wasted space */
> +			event->time_delta = 0;
> +			event->array[0] = 0;
> +			ret = 0;
> +		}
>  	}
>  
>  	*delta = 0;
> @@ -1682,10 +1717,6 @@ void ring_buffer_discard_commit(struct ring_buffer *buffer,
>  				struct ring_buffer_event *event)
>  {
>  	struct ring_buffer_per_cpu *cpu_buffer;
> -	unsigned long new_index, old_index;
> -	struct buffer_page *bpage;
> -	unsigned long index;
> -	unsigned long addr;
>  	int cpu;
>  
>  	/* The event is discarded regardless */
> @@ -1701,24 +1732,8 @@ void ring_buffer_discard_commit(struct ring_buffer *buffer,
>  	cpu = smp_processor_id();
>  	cpu_buffer = buffer->buffers[cpu];
>  
> -	new_index = rb_event_index(event);
> -	old_index = new_index + rb_event_length(event);
> -	addr = (unsigned long)event;
> -	addr &= PAGE_MASK;
> -
> -	bpage = cpu_buffer->tail_page;
> -
> -	if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
> -		/*
> -		 * This is on the tail page. It is possible that
> -		 * a write could come in and move the tail page
> -		 * and write to the next page. That is fine
> -		 * because we just shorten what is on this page.
> -		 */
> -		index = local_cmpxchg(&bpage->write, old_index, new_index);
> -		if (index == old_index)
> -			goto out;
> -	}
> +	if (!rb_try_to_discard(cpu_buffer, event))
> +		goto out;

The sense of this test is wrong.  It should be:

	if (rb_try_to_discard(cpu_buffer, event))
		goto out;

You want to go to 'out' (skipping the increment of entries)
if you successfully freed the event.

In my testing, rb_try_to_discard was almost alway successful.
Since only one statement is jumped over by the goto, the
following might be better:

	if (unlikely(!rb_try_to_discard(cpu_buffer, event)))
		/*
		 * The commit is still visible by the reader, so we
		 * must increment entries.
		 */
		local_inc(&cpu_buffer->entries);
	

>  	/*
>  	 * The commit is still visible by the reader, so we
> -- 1.6.3.1
> -- 


-- 
=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/