[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140626095831.667f834f@gandalf.local.home>
Date: Thu, 26 Jun 2014 09:58:31 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Petr Mladek <pmladek@...e.cz>
Cc: Ingo Molnar <mingo@...e.hu>,
Frederic Weisbecker <fweisbec@...il.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Jiri Kosina <jkosina@...e.cz>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ring-buffer: Race when writing and swapping cpu buffer
in parallel
On Thu, 26 Jun 2014 15:22:38 +0200
Petr Mladek <pmladek@...e.cz> wrote:
> The trace/ring_buffer allows to swap the entire ring buffer. Everything has to
> be done lockless. I think that I have found a race when trying to understand
> the code. The problematic situation is the following:
>
> CPU 1 (write/reserve event) CPU 2 (swap the cpu buffer)
> -------------------------------------------------------------------------
> ring_buffer_write()
> if (cpu_buffer->record_disabled)
> ^^^ test fails and write continues
>
> ring_buffer_swap_cpu()
This is a per cpu swap, not a full ring buffer swap.
>
> inc(&cpu_buffer_a->record_disabled);
> inc(&cpu_buffer_b->record_disabled);
>
> if (cpu_buffer_a->committing)
> ^^^ test fails and swap continues
> if (cpu_buffer_b->committing)
> ^^^ test fails and swap continues
Grumble, this was originally written such that the swap can only be
done on the CPU that it is running on.
>
> rb_reserve_next_event()
> rb_start_commit()
> local_inc(&cpu_buffer->committing);
> local_inc(&cpu_buffer->commits);
>
> if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
> ^^^ test fails and reserve_next continues
>
> buffer_a->buffers[cpu] = cpu_buffer_b;
> buffer_b->buffers[cpu] = cpu_buffer_a;
> cpu_buffer_b->buffer = buffer_a;
> cpu_buffer_a->buffer = buffer_b;
>
> Pheeee, reservation continues in the removed buffer.
>
> This can be solved by a better check in rb_reserve_next_event(). The reservation
> could continue only when "committing" is enabled and there is no swap in
> progress or when any swap was not finished in the meantime.
>
> Note that the solution is not perfect. It stops writing also in this situation:
>
> CPU 1 (write/reserve event) CPU 2 (swap the cpu buffer)
> ----------------------------------------------------------------------------
> ring_buffer_write()
> if (cpu_buffer->record_disabled)
> ^^^ test fails and write continues
>
> ring_buffer_swap_cpu()
>
> inc(&cpu_buffer_a->record_disabled);
> inc(&cpu_buffer_b->record_disabled);
>
> rb_reserve_next_event()
> rb_start_commit()
> local_inc(&cpu_buffer->committing);
> local_inc(&cpu_buffer->commits);
>
> if (cpu_buffer_a->committing)
> ^^^ test passes and swap is canceled
>
> if (cpu_buffer->record_disabled)
> ^^^ test passes and reserve is canceled
>
> dec(&cpu_buffer_a->record_disabled);
> dec(&cpu_buffer_b->record_disabled);
>
> Pheeee, both actions were canceled. The write/reserve could have continued
> if the recording was enabled in time.
>
> Well, it is the price for using lockless algorithms. I think that it happens
> in more situations here, it is not worth more complicated code, and we could
> live with it.
>
> The patch also adds some missing memory barriers. Note that compiler barriers
> are not enough because the data can be accessed by different CPUs.
>
> Signed-off-by: Petr Mladek <pmladek@...e.cz>
> ---
> kernel/trace/ring_buffer.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 7c56c3d06943..3020060ded7e 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2529,13 +2529,15 @@ rb_reserve_next_event(struct ring_buffer *buffer,
>
> #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
> /*
> - * Due to the ability to swap a cpu buffer from a buffer
> - * it is possible it was swapped before we committed.
> - * (committing stops a swap). We check for it here and
> - * if it happened, we have to fail the write.
> + * The cpu buffer swap could have started before we managed to stop it
> + * by incrementing the "committing" values. If the swap is in progress,
> + * we see disabled recording. If the swap has finished, we see the new
> + * cpu buffer. In both cases, we should go back and stop committing
> + * to the old buffer. See also ring_buffer_swap_cpu()
> */
> - barrier();
> - if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
> + smp_mb();
> + if (unlikely(atomic_read(&cpu_buffer->record_disabled) ||
> + ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
> local_dec(&cpu_buffer->committing);
> local_dec(&cpu_buffer->commits);
> return NULL;
> @@ -4334,6 +4336,13 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
> atomic_inc(&cpu_buffer_a->record_disabled);
> atomic_inc(&cpu_buffer_b->record_disabled);
>
> + /*
> + * We could not swap if a commit is in progress. Also any commit could
> + * not start after we have stopped recording. Keep both checks in sync.
> + * The counter part is in rb_reserve_next_event()
> + */
> + smp_mb();
This will kill ring buffer performance. So I will not allow this fix.
What we can do is force ring_buffer_swap_cpu() to only work for the CPU
that it is on. As we have snapshot in per_cpu buffers, to make that
work, we will need to change the per_cpu version of snapshot to do a
smp_call_function_single() to the CPU that it wants to take a snapshot
of, and run the swap there.
To force this, we can remove the cpu parameter from the
ring_buffer_swap_cpu(). By doing this, we may be able to remove some of
the CONFIG_RING_BUFFER_ALLOW_SWAP hacks too!
I'm not going to sacrifice the general performance of the ring buffer
for a feature that is seldom (if ever) used.
-- Steve
> +
> ret = -EBUSY;
> if (local_read(&cpu_buffer_a->committing))
> goto out_dec;
> @@ -4348,6 +4357,12 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>
> ret = 0;
>
> + /*
> + * Mare sure that rb_reserve_next_event() see the right
> + * buffer before we enable recording.
> + */
> + smp_wmb();
> +
> out_dec:
> atomic_dec(&cpu_buffer_a->record_disabled);
> atomic_dec(&cpu_buffer_b->record_disabled);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists