[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1273769178.27703.1119.camel@gandalf.stny.rr.com>
Date: Thu, 13 May 2010 12:46:18 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Pierre Tardy <tardyp@...il.com>, Ingo Molnar <mingo@...e.hu>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Tom Zanussi <tzanussi@...il.com>,
Paul Mackerras <paulus@...ba.org>,
linux-kernel@...r.kernel.org, arjan@...radead.org,
ziga.mahkovec@...il.com, davem <davem@...emloft.net>
Subject: Re: Perf and ftrace [was Re: PyTimechart]
On Thu, 2010-05-13 at 12:31 -0400, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@...dmis.org) wrote:
> > On Thu, 2010-05-13 at 09:20 -0400, Mathieu Desnoyers wrote:
> [...]
> > > > >
> > > > > ...
> > > > >
> > > > > 97 /**
> > > > > 98 * ring_buffer_clear_noref_flag - Clear the noref subbuffer flag, for writer.
> > > > > 99 */
> > > > > 100 static __inline__
> > > > > 101 void ring_buffer_clear_noref_flag(struct ring_buffer_backend *bufb,
> > > > > 102 unsigned long idx)
> > > > > 103 {
> > > > > 104 struct ring_buffer_backend_page *sb_pages, *new_sb_pages;
> > > > > 105
> > > > > 106 sb_pages = bufb->buf_wsb[idx].pages;
> > > > > 107 for (;;) {
> > > > > 108 if (!RCHAN_SB_IS_NOREF(sb_pages))
> > > > > 109 return; /* Already writing to this buffer */
> > > > > 110 new_sb_pages = sb_pages;
> > > > > 111 RCHAN_SB_CLEAR_NOREF(new_sb_pages);
> > > > > 112 new_sb_pages = cmpxchg(&bufb->buf_wsb[idx].pages,
> > > > > 113 sb_pages, new_sb_pages);
> > > > > 114 if (likely(new_sb_pages == sb_pages))
> > > > > 115 break;
> > > > > 116 sb_pages = new_sb_pages;
> > > >
> > > > The writer calls this??
> > >
> > > Yes. But the common case (for each event) is simply a
> > > "if (!RCHAN_SB_IS_NOREF(sb_pages))" test that returns. The cmpxchg() is only
> > > performed at subbuffer boundary.
> >
> > Is the cmpxchg only contending with other writers?
>
> No. Would have this been the case, I would have used a cmpxchg_local(). This
> cmpxchg used to deal with subbuffer swap is touching the subbuffer "pages"
> pointer, which can be updated concurrently by other writers as well as readers.
>
> The writer clears the noref flags when starting to write in a subbuffers, and
> sets it when delivering the subbuffer (when it is fully committed).
>
> The reader can only ever swap the subbuffer with the one it owns if the noref
> flag is set. The reader uses a cmpxchg() too to perform the swap.
>
So the writer can loop because of a reader? And theoretically, a
aggressive reader that is constantly swapping pages can halt a writer?
I thought you said you did not have loops in the writers?
> [...]
>
> > > >
> > > > This looks just like the swap with reader_page that I do, except you use
> > > > a table and I use the list. How do you replenish the buf_rsb.pages if
> > > > the splice keeps the page you just received active?
> > >
> > > I don't allow other reads to proceed as long as splice is holding pages that
> > > belong to the reader-owned subbuffer. The read semantic is basically:
> > >
> > > ring_buffer_open_read() /* only one reader at a time can open a ring buffer */
> > > get_subbuf_size()
> > > while (buffer is not finalized and empty) {
> > > poll()
> > > ret = ring_buffer_get_subbuf()
> > > if (!ret)
> > > continue;
> > > /* The splice ops below can be performed in multiple calls, e.g. first splice
> > > * only a portion of a subbuffer to a pipe, then splice to the disk/network,
> > > * and move to the next subbuffer portion until all the subbuffer is sent.
> > > */
> > > splice one subbuffer worth of data to a pipe
> > > splice the data from pipe to disk/network
> > > ring_buffer_put_subbuf()
> > > }
> > > ring_buffer_close_read()
> > >
> > > The reader code above works both with flight recorder and non-overwrite mode.
> > >
> > > The code above assumes that upon return from the splice() to disk/network,
> > > splice() is not using the pages anymore (I assume that splice() performs the
> > > transfer synchronously with the call).
> > >
> > > The VFS interface I use for get_subbuf_size(), ring_buffer_get_subbuf() and
> > > ring_buffer_put_subbuf() are new ioctls. Note that these can be used for both
> > > splice() and mmap() types of backend access, as they only call into the
> > > frontend.
> >
> > Hmm, so basically you lose pages until they are returned. I guess I can
> > trivially add the same thing now to the current ring buffer.
>
> Yep. Having the ability to keep an array of pages (rather that just a single
> page at a time) allows splice() to move many pages at once efficiently, while
> permitting this "pages owned by the readers, lend to splice() until it returns"
> simplification. I also never have to allocate pages while tracing: all the pages
> I need are allocated when the buffer is created (and at the special case of cpu
> hotplug, but this is expected for per-cpu buffers).
Yep, I just wrote the changes for my buffer. It was rather trivial ;-)
Although, I just set a hard coded 16 pages max of cached pages.
-- Steve
>
> In addition, this would play well with mmap() too: we can simply add a
> ring_buffer_get_mmap_offset() method to the backend (exported through another
> ioctl) that would let user-space know the start of the mmap'd buffer range
> currently owned by the reader. So we can inform user-space of the currently
> owned page range without even changing the underlying memory map.
>
> Thanks,
>
> Mathieu
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists