[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200625155850.13be7bfa@oasis.local.home>
Date: Thu, 25 Jun 2020 15:58:50 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: linux-kernel <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Masami Hiramatsu <mhiramat@...nel.org>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
Yordan Karadzhov <y.karadz@...il.com>,
Tzvetomir Stoyanov <tz.stoyanov@...il.com>,
Tom Zanussi <zanussi@...nel.org>,
Jason Behmer <jbehmer@...gle.com>,
Julia Lawall <julia.lawall@...ia.fr>,
Clark Williams <williams@...hat.com>,
bristot <bristot@...hat.com>, Daniel Wagner <wagi@...om.org>,
Darren Hart <dvhart@...are.com>,
Jonathan Corbet <corbet@....net>,
"Suresh E. Warrier" <warrier@...ux.vnet.ibm.com>
Subject: Re: [RFC][PATCH] ring-buffer: Have nested events still record
running time stamp
On Thu, 25 Jun 2020 15:35:02 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> >
> > Well, write_stamp is updated via local64, which I believe handles this
> > for us. I probably should make before_stamp handle it as well.
>
> By looking at local64 headers, it appears that 32-bit rely on atomic64,
> which on x86 is implemented with LOCK; cmpxchg8b for 586+ (which is AFAIK
> painfully slow) and with cli/sti for 386/486 (which is not nmi-safe).
>
> For all other 32-bit architectures, the generic atomic64.h implements 64-bit
> atomics using spinlocks with irqs off, which seems to also bring considerable
> overhead, in addition to be non-reentrant with respect to NMI-like interrupts,
> e.g. FIQ on ARM32.
>
> That seems at odds with the performance constraints of ftrace's ring buffer.
>
> Those performance and reentrancy concerns are why I always stick to local_t
> (long), and never use a full 64-bit type for anything that has to
> do with concurrent store/load between execution contexts in lttng.
If this is an issue, I'm sure I can make my own wrappers for
"time_local()", and implement something that you probably do. Because,
we only need to worry about wrapping the 32 bit lower number, as that
only happens every 4 seconds. But that is an implementation detail, it
doesn't affect the overall design correctness.
But it is something I should definitely look in to.
>
> >
> >
> >>
> >> > * a full time stamp (this can turn into a time extend which
> >> > is
> >> > * just an extended time delta but fill up the extra space).
> >> > */
> >> > if (after != before)
> >> > abs = true;
> >> >
> >> > ts = clock();
> >> >
> >> > /* Now update the before_stamp (everyone does this!) */
> >> > [B] WRITE_ONCE(before_stamp, ts);
> >> >
> >> > /* Read the current next_write and update it to what we want
> >> > write
> >> > * to be after we reserve space. */
> >> > next = READ_ONCE(next_write);
> >> > WRITE_ONCE(next_write, w + len);
> >> >
> >> > /* Now reserve space on the buffer */
> >> > [C] write = local_add_return(len, write_tail);
> >>
> >> So the reservation is not "just" an add instruction, it's actually an
> >> xadd on x86. Is that really faster than a cmpxchg ?
> >
> > I believe the answer is still yes. But I can run some benchmarks to
> > make sure.
>
> This would be interesting to see, because if xadd and cmpxchg have
> similar overhead, then going for a cmpxchg-loop for the space
> reservation could vastly decrease the overall complexity of this
> timestamp+space reservation algorithm.
It would most likely cause userspace breakage, and that would be a show
stopper.
But still good to see.
Thanks for the review.
-- Steve
Powered by blists - more mailing lists