lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 12 May 2010 16:27:46 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Pierre Tardy <tardyp@...il.com>, Ingo Molnar <mingo@...e.hu>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Tom Zanussi <tzanussi@...il.com>,
	Paul Mackerras <paulus@...ba.org>,
	linux-kernel@...r.kernel.org, arjan@...radead.org,
	ziga.mahkovec@...il.com, davem <davem@...emloft.net>
Subject: Re: Perf and ftrace [was Re: PyTimechart]

* Steven Rostedt (rostedt@...dmis.org) wrote:
> On Wed, 2010-05-12 at 14:37 -0400, Mathieu Desnoyers wrote:
> 
> > OK, I see. In LTTng, I dropped the mmap() support when I integrated splice(). In
> > both case, I can share the pages between the "output" (mmap or splice) and the
> > ring buffer because my ring buffer does not care about
> > page->mapping/->index/etc, so I never have to swap them.
> 
> I'm curious, how do you handle the overwrite mode without swapping?

Explanation extracted from:

http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf

5.4 Atomic Buffering Scheme
5.4.3 Algorithms

"This is achieved by adding a supplementary sub-buffer, owned by the reader. A
table with pointers to the sub-buffers being used by the writer allows the
reader to change the reference to each sub-buffer atomically. The
ReadGetSubbuf() algorithm is responsible for atomically exchanging the reference
to the sub-buffer about to be read with the sub-buffer currently owned by the
reader. If the CAS operation fails, the reader does not get access to the buffer
for reading."

I know your mother tongue is C, not English, so I just prepared a git repo with
the current state of my work (please note that I'm currently in the process of
cleaning up this code).

http://git.kernel.org/?p=linux/kernel/git/compudj/linux-2.6-ringbuffer.git

Interesting bits below.

Thanks,

Mathieu

Note: The "frontend" refers to the buffer writer/reader synchronization
algorithm. The "backend" deals with allocation of the memory buffers. This
frontend/backend separation permits to use the same ring buffer synchronization
code to write data to kernel pages, to video memory, to serial ports, etc etc,
without having to deal with different synchronization schemes.

Where the reader grabs the sub-buffer :

kernel/trace/ring_buffer_frontend.c: ring_buffer_get_subbuf()

396         ret = update_read_sb_index(&buf->backend, &chan->backend, consumed_idx);
397         if (ret)
398                 return ret;

and releases it:

kernel/trace/ring_buffer_frontend.c: ring_buffer_put_subbuf()

415         RCHAN_SB_SET_NOREF(buf->backend.buf_rsb.pages);

The writer clears the "noref" flag when it starts writing to a subbuffer, and
clears that flag when it has fully committed a subbuffer.

The primitives used by the "synchronization frontend" are declared in the
backend here:

kernel/trace/ring_buffer_page_backend_internal.h:

Interesting definitions and data structures for our current discussions:

17 #define RCHAN_SB_IS_NOREF(x)    ((unsigned long)(x) & RCHAN_NOREF_FLAG)
18 #define RCHAN_SB_SET_NOREF(x)   \
19         (x = (struct ring_buffer_backend_page *) \
20                 ((unsigned long)(x) | RCHAN_NOREF_FLAG))
21 #define RCHAN_SB_CLEAR_NOREF(x) \
22         (x = (struct ring_buffer_backend_page *) \
23                 ((unsigned long)(x) & ~RCHAN_NOREF_FLAG))
24
25 struct ring_buffer_backend_page {
26         void *virt;                     /* page virtual address (cached) */
27         struct page *page;              /* pointer to page structure */
28 };
29
30 struct ring_buffer_backend_subbuffer {
31         /* Pointer to backend pages for subbuf */
32         struct ring_buffer_backend_page *pages;
33 };

...

41 struct ring_buffer_backend {
42         /* Array of chanbuf_sb for writer */
43         struct ring_buffer_backend_subbuffer *buf_wsb;
44         /* chanbuf_sb for reader */
45         struct ring_buffer_backend_subbuffer buf_rsb;

...

97 /**
98  * ring_buffer_clear_noref_flag - Clear the noref subbuffer flag, for writer.
99  */
100 static __inline__
101 void ring_buffer_clear_noref_flag(struct ring_buffer_backend *bufb,
102                                   unsigned long idx)
103 {
104         struct ring_buffer_backend_page *sb_pages, *new_sb_pages;
105
106         sb_pages = bufb->buf_wsb[idx].pages;
107         for (;;) {
108                 if (!RCHAN_SB_IS_NOREF(sb_pages))
109                         return; /* Already writing to this buffer */
110                 new_sb_pages = sb_pages;
111                 RCHAN_SB_CLEAR_NOREF(new_sb_pages);
112                 new_sb_pages = cmpxchg(&bufb->buf_wsb[idx].pages,
113                         sb_pages, new_sb_pages);
114                 if (likely(new_sb_pages == sb_pages))
115                         break;
116                 sb_pages = new_sb_pages;
117         }
118 }
119
120 /**
121  * ring_buffer_set_noref_flag - Set the noref subbuffer flag, for writer.
122  */
123 static __inline__
124 void ring_buffer_set_noref_flag(struct ring_buffer_backend *bufb,
125                                 unsigned long idx)
126 {
127         struct ring_buffer_backend_page *sb_pages, *new_sb_pages;
128
129         sb_pages = bufb->buf_wsb[idx].pages;
130         for (;;) {
131                 if (RCHAN_SB_IS_NOREF(sb_pages))
132                         return; /* Already set */
133                 new_sb_pages = sb_pages;
134                 RCHAN_SB_SET_NOREF(new_sb_pages);
135                 new_sb_pages = cmpxchg(&bufb->buf_wsb[idx].pages,
136                         sb_pages, new_sb_pages);
137                 if (likely(new_sb_pages == sb_pages))
138                         break;
139                 sb_pages = new_sb_pages;
140         }
141 }
142
143 /**
144  * update_read_sb_index - Read-side subbuffer index update.
145  */
146 static __inline__
147 int update_read_sb_index(struct ring_buffer_backend *bufb,
148                          struct channel_backend *chanb,
149                          unsigned long consumed_idx)
150 {
151         struct ring_buffer_backend_page *old_wpage, *new_wpage;
152
153         if (unlikely(chanb->extra_reader_sb)) {
154                 /*
155                  * Exchange the target writer subbuffer with our own unused
156                  * subbuffer.
157                  */
158                 old_wpage = bufb->buf_wsb[consumed_idx].pages;
159                 if (unlikely(!RCHAN_SB_IS_NOREF(old_wpage)))
160                         return -EAGAIN;
161                 WARN_ON_ONCE(!RCHAN_SB_IS_NOREF(bufb->buf_rsb.pages));
162                 new_wpage = cmpxchg(&bufb->buf_wsb[consumed_idx].pages,
163                                 old_wpage,
164                                 bufb->buf_rsb.pages);
165                 if (unlikely(old_wpage != new_wpage))
166                         return -EAGAIN;
167                 bufb->buf_rsb.pages = new_wpage;
168                 RCHAN_SB_CLEAR_NOREF(bufb->buf_rsb.pages);
169         } else {
170                 /* No page exchange, use the writer page directly */
171                 bufb->buf_rsb.pages = bufb->buf_wsb[consumed_idx].pages;
172                 RCHAN_SB_CLEAR_NOREF(bufb->buf_rsb.pages);
173         }
174         return 0;
175 }


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ