[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100512202745.GK21432@Krystal>
Date: Wed, 12 May 2010 16:27:46 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Pierre Tardy <tardyp@...il.com>, Ingo Molnar <mingo@...e.hu>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Tom Zanussi <tzanussi@...il.com>,
Paul Mackerras <paulus@...ba.org>,
linux-kernel@...r.kernel.org, arjan@...radead.org,
ziga.mahkovec@...il.com, davem <davem@...emloft.net>
Subject: Re: Perf and ftrace [was Re: PyTimechart]
* Steven Rostedt (rostedt@...dmis.org) wrote:
> On Wed, 2010-05-12 at 14:37 -0400, Mathieu Desnoyers wrote:
>
> > OK, I see. In LTTng, I dropped the mmap() support when I integrated splice(). In
> > both case, I can share the pages between the "output" (mmap or splice) and the
> > ring buffer because my ring buffer does not care about
> > page->mapping/->index/etc, so I never have to swap them.
>
> I'm curious, how do you handle the overwrite mode without swapping?
Explanation extracted from:
http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf
5.4 Atomic Buffering Scheme
5.4.3 Algorithms
"This is achieved by adding a supplementary sub-buffer, owned by the reader. A
table with pointers to the sub-buffers being used by the writer allows the
reader to change the reference to each sub-buffer atomically. The
ReadGetSubbuf() algorithm is responsible for atomically exchanging the reference
to the sub-buffer about to be read with the sub-buffer currently owned by the
reader. If the CAS operation fails, the reader does not get access to the buffer
for reading."
I know your mother tongue is C, not English, so I just prepared a git repo with
the current state of my work (please note that I'm currently in the process of
cleaning up this code).
http://git.kernel.org/?p=linux/kernel/git/compudj/linux-2.6-ringbuffer.git
Interesting bits below.
Thanks,
Mathieu
Note: The "frontend" refers to the buffer writer/reader synchronization
algorithm. The "backend" deals with allocation of the memory buffers. This
frontend/backend separation permits to use the same ring buffer synchronization
code to write data to kernel pages, to video memory, to serial ports, etc etc,
without having to deal with different synchronization schemes.
Where the reader grabs the sub-buffer :
kernel/trace/ring_buffer_frontend.c: ring_buffer_get_subbuf()
396 ret = update_read_sb_index(&buf->backend, &chan->backend, consumed_idx);
397 if (ret)
398 return ret;
and releases it:
kernel/trace/ring_buffer_frontend.c: ring_buffer_put_subbuf()
415 RCHAN_SB_SET_NOREF(buf->backend.buf_rsb.pages);
The writer clears the "noref" flag when it starts writing to a subbuffer, and
clears that flag when it has fully committed a subbuffer.
The primitives used by the "synchronization frontend" are declared in the
backend here:
kernel/trace/ring_buffer_page_backend_internal.h:
Interesting definitions and data structures for our current discussions:
17 #define RCHAN_SB_IS_NOREF(x) ((unsigned long)(x) & RCHAN_NOREF_FLAG)
18 #define RCHAN_SB_SET_NOREF(x) \
19 (x = (struct ring_buffer_backend_page *) \
20 ((unsigned long)(x) | RCHAN_NOREF_FLAG))
21 #define RCHAN_SB_CLEAR_NOREF(x) \
22 (x = (struct ring_buffer_backend_page *) \
23 ((unsigned long)(x) & ~RCHAN_NOREF_FLAG))
24
25 struct ring_buffer_backend_page {
26 void *virt; /* page virtual address (cached) */
27 struct page *page; /* pointer to page structure */
28 };
29
30 struct ring_buffer_backend_subbuffer {
31 /* Pointer to backend pages for subbuf */
32 struct ring_buffer_backend_page *pages;
33 };
...
41 struct ring_buffer_backend {
42 /* Array of chanbuf_sb for writer */
43 struct ring_buffer_backend_subbuffer *buf_wsb;
44 /* chanbuf_sb for reader */
45 struct ring_buffer_backend_subbuffer buf_rsb;
...
97 /**
98 * ring_buffer_clear_noref_flag - Clear the noref subbuffer flag, for writer.
99 */
100 static __inline__
101 void ring_buffer_clear_noref_flag(struct ring_buffer_backend *bufb,
102 unsigned long idx)
103 {
104 struct ring_buffer_backend_page *sb_pages, *new_sb_pages;
105
106 sb_pages = bufb->buf_wsb[idx].pages;
107 for (;;) {
108 if (!RCHAN_SB_IS_NOREF(sb_pages))
109 return; /* Already writing to this buffer */
110 new_sb_pages = sb_pages;
111 RCHAN_SB_CLEAR_NOREF(new_sb_pages);
112 new_sb_pages = cmpxchg(&bufb->buf_wsb[idx].pages,
113 sb_pages, new_sb_pages);
114 if (likely(new_sb_pages == sb_pages))
115 break;
116 sb_pages = new_sb_pages;
117 }
118 }
119
120 /**
121 * ring_buffer_set_noref_flag - Set the noref subbuffer flag, for writer.
122 */
123 static __inline__
124 void ring_buffer_set_noref_flag(struct ring_buffer_backend *bufb,
125 unsigned long idx)
126 {
127 struct ring_buffer_backend_page *sb_pages, *new_sb_pages;
128
129 sb_pages = bufb->buf_wsb[idx].pages;
130 for (;;) {
131 if (RCHAN_SB_IS_NOREF(sb_pages))
132 return; /* Already set */
133 new_sb_pages = sb_pages;
134 RCHAN_SB_SET_NOREF(new_sb_pages);
135 new_sb_pages = cmpxchg(&bufb->buf_wsb[idx].pages,
136 sb_pages, new_sb_pages);
137 if (likely(new_sb_pages == sb_pages))
138 break;
139 sb_pages = new_sb_pages;
140 }
141 }
142
143 /**
144 * update_read_sb_index - Read-side subbuffer index update.
145 */
146 static __inline__
147 int update_read_sb_index(struct ring_buffer_backend *bufb,
148 struct channel_backend *chanb,
149 unsigned long consumed_idx)
150 {
151 struct ring_buffer_backend_page *old_wpage, *new_wpage;
152
153 if (unlikely(chanb->extra_reader_sb)) {
154 /*
155 * Exchange the target writer subbuffer with our own unused
156 * subbuffer.
157 */
158 old_wpage = bufb->buf_wsb[consumed_idx].pages;
159 if (unlikely(!RCHAN_SB_IS_NOREF(old_wpage)))
160 return -EAGAIN;
161 WARN_ON_ONCE(!RCHAN_SB_IS_NOREF(bufb->buf_rsb.pages));
162 new_wpage = cmpxchg(&bufb->buf_wsb[consumed_idx].pages,
163 old_wpage,
164 bufb->buf_rsb.pages);
165 if (unlikely(old_wpage != new_wpage))
166 return -EAGAIN;
167 bufb->buf_rsb.pages = new_wpage;
168 RCHAN_SB_CLEAR_NOREF(bufb->buf_rsb.pages);
169 } else {
170 /* No page exchange, use the writer page directly */
171 bufb->buf_rsb.pages = bufb->buf_wsb[consumed_idx].pages;
172 RCHAN_SB_CLEAR_NOREF(bufb->buf_rsb.pages);
173 }
174 return 0;
175 }
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists