Changelog since v1:
* Update documentation matching new basic API.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 Documentation/ring-buffer/ring-buffer-design.txt |   78 +++++++
 Documentation/ring-buffer/ring-buffer-usage.txt  |  254 +++++++++++++++++++++++
 2 files changed, 332 insertions(+)

Index: linux.trees.git/Documentation/ring-buffer/ring-buffer-design.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/Documentation/ring-buffer/ring-buffer-design.txt	2010-07-15 13:36:17.000000000 -0400
@@ -0,0 +1,78 @@
+                        Ring Buffer Library Design
+
+                            Mathieu Desnoyers
+
+
+This document explains Linux Kernel Ring Buffer library.
+
+
+* Purpose of the ring buffer library
+
+Tracing: the main purpose of the ring buffer library is to perform tracing
+efficiently by providing an efficient ring buffer to transport trace data.
+
+Fast fifo queue for drivers: this library is meant to be generic enough to meet
+the requirements of audio, video and other drivers to provide an easy-to-use,
+yet efficient, buffering API.
+
+Lock-free write-side: the main advantage of this ring buffer implementation is
+that it provides non-blocking synchronization for the writer context. It
+furthermore provides a bounded write-side execution time for real-time
+applications. The per-CPU buffer configuration is wait-free. The global buffer
+configuration is lock-free. (wait-free is a stronger progress guarantee than
+lock-free.)
+
+
+* Semantic
+
+The execution context writing to the ring buffer is hereby called "producer" (or
+writer) and the thread reading the ring buffer content is called "consumer" (or
+reader). Each instance of either per-cpu or global ring buffers is called a
+"channel". A buffer is divided into subbuffers, which are synchronization points
+in the buffers (sometimes referred to as periods in the audio world). Each item
+stored in the ring buffer is called a "record". Both subbuffers and records
+may start with a "header". Records can also contain a variable-sized payload.
+
+The ring buffer supports two write modes. The "discard" mode drops data when the
+ring buffer is full. The "overwrite" (a.k.a. flight recorder) mode overwrites
+the oldest information when the ring buffer is full.
+
+Iterators are one way to consume data from the ring buffer. They allow a reader
+thread to read records one by one in the order they were written, either on a
+per-buffer or per-channel basis. Other ways to consume data are by using file
+descriptors which provide access to raw subbuffer content through, e.g.,
+splice() or mmap().
+
+
+* Programmer Interfaces
+
+The library presents a high-level interface that allows programmers to easily
+create and use a ring buffer instance. It also provides a more advanced client
+configuration API for clients with more elaborate needs (e.g. tracers).
+
+
+* Advanced client configuration options
+
+The options listed in the linux/ringbuffer/config.h header are tailored for ring
+buffer "clients" (a kernel object using the ring buffer library through its
+advanced options API) with more specific needs. The clients must set up a
+"static const" ring_buffer_config structure in which all options are spelled
+out. Given that this structure is known to be immutable, compiler optimizations
+can optimize away all the unneeded code from the library inline fast paths. The
+slow paths, however, dynamically select the correct code depending on the
+ring_buffer_config structure received as parameter. This saves space by sharing
+the slow path code between all ring buffer clients.
+
+
+* Frontend/backend layered design
+
+The ring buffer is made of two main layers: a frontend and a backend. The
+"frontend" locklessly manages space reservation within the buffer. It also
+manages timers, idle and cpu hotplug. The "backend" manages the memory backend
+used to allocate the buffers. It deals with subbuffer exchanges between the
+consumer and the producer in overwrite mode. Currently, only a page-based
+backend is implemented (RING_BUFFER_PAGE), but other backends are planned for
+the future: statically allocated backends (RING_BUFFER_STATIC) and vmap-based
+backends (RING_BUFFER_VMAP). These will allow, for instance, tracers to write
+trace data in a physically contiguous memory region allocated at boot time, or
+to write data in video card memory for crash reports.
Index: linux.trees.git/Documentation/ring-buffer/ring-buffer-usage.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux.trees.git/Documentation/ring-buffer/ring-buffer-usage.txt	2010-08-16 16:08:56.000000000 -0400
@@ -0,0 +1,254 @@
+		        Ring Buffer Library Usage
+
+			    Mathieu Desnoyers
+
+
+This document explains how to use the Linux Kernel Ring Buffer Library.
+
+The library presents a high-level interface that allows programmers to easily
+create and use a ring buffer instance. It also provides a more advanced client
+configuration API for clients with more elaborate needs (e.g. tracers).
+
+
+* Basic ring buffer configurations
+
+  The basic high-level configurations offered are pre-built clients with the
+following configuration selections under include/linux/ringbuffer/basic_api.h:
+
+  * The write-side (data producer) configurations:
+
+      * global buffer, overwrite mode, channel-wide record iterator
+      * global buffer, discard mode, channel-wide record iterator
+      * per-cpu buffers, overwrite mode, channel-wide record iterator
+      * per-cpu buffers, discard mode, channel-wide record iterator
+      * per-cpu buffers, overwrite mode, per-cpu buffer record iterator
+      * per-cpu buffers, discard mode, per-cpu buffer record iterator
+
+  Typical use-case of the ring buffer write-side:
+
+    1) create
+    2) multiple calls to the write primitive.
+    3) destroy
+
+
+  * The read-side (data consumer) iterator APIs are available in:
+
+  - iterator.h
+
+    These iterators allow to iterate on records either on a per-cpu buffer or
+    channel-wide basis.
+
+    Typical life-span of a reader using the file descriptor read() iterator:
+
+    (in user-space)
+    # cat /path_to_file/filename
+
+    The ring buffer iterator can be associated to a dentry by passing the
+    channel returned by ring_buffer_basic_get_channel() as file private data and
+    by associating channel_payload_file_operations (from iterator.h) to the file.
+
+
+    Typical life-span of a reader using the in-kernel API:
+
+    1) iterator_open()
+    2) get_next_record and read_current_record until get_next_record returns
+       -ENODATA. -EAGAIN means there is currently no data, but there might be
+       more data coming in the future.
+    3) iterator_close()
+
+
+* Advanced client configurations
+
+  * Advanced client configuration options
+
+  More options are available for clients with more advanced needs. These options
+are listed in the linux/ringbuffer/config.h header. A ring buffer "client" (a
+kernel object using the ring buffer library through its advanced options API)
+must set up a "static const" ring_buffer_config structure in which all options
+are spelled out.
+
+The pre-built basic configurations presented in the above set these advanced
+configuration options to values typically correct for driver use.
+
+A client using the advanced configuration options must first include
+linux/ringbuffer/config.h, declare its configuration structure, declare the
+required static inline functions used by the fast-paths, and then include
+linux/ringbuffer/api.h.
+
+The struct ring_buffer_config options are:
+
+  * alloc: RING_BUFFER_ALLOC_PER_CPU / RING_BUFFER_ALLOC_GLOBAL
+
+    Selects either global buffer or per-cpu ring buffers.
+
+  * sync: RING_BUFFER_SYNC_PER_CPU / RING_BUFFER_SYNC_GLOBAL
+
+    Selects which synchronization primitives must be used. Either expect
+    concurrency from other processors, or expect to only have concurrency with
+    the local processor. Separated from the "alloc" option because per-thread
+    buffers would fit in the "global alloc, per-cpu sync". Similarly, per-cpu
+    buffers written to with preemption enabled would fit in the "per-cpu
+    alloc, global sync" category, because migration could lead to a concurrent
+    write into a remote cpu buffer.
+
+  * mode: RING_BUFFER_OVERWRITE / RING_BUFFER_DISCARD
+
+    Either overwrite oldest subbuffers when buffer is full, or discard events.
+
+  * align: RING_BUFFER_NATURAL / RING_BUFFER_PACKED
+
+    Natural alignment aligns record headers on their natural alignment on the
+    architecture. It also aligns record payload on their natural alignment
+    (similarly to a C structure). The packed option does not perform any
+    alignment for record header and payloads. It corresponds to the "packed" gcc
+    type attribute.
+
+  * output:
+
+      RING_BUFFER_SPLICE:   Output raw subbuffers through per-buffer file
+                            descriptors with splice(). The read-side
+                            synchronization needed to select the current
+                            subbuffer is performed with ioctl().
+
+      RING_BUFFER_MMAP:     Output raw subbuffers through per-buffer memory
+                            mapped file descriptors. Read-side synchronization
+                            to select the current subbuffer is performed with
+                            ioctl().
+
+      RING_BUFFER_READ:     Output raw subbuffers through per-buffer file
+                            descriptors with read(). The read-side
+                            synchronization needed to select the current
+                            subbuffer is performed with ioctl().
+                            (unimplemented)
+
+      RING_BUFFER_ITERATOR: Iterators allow a reader thread to read records one
+                            by one in the order they were written, either on a
+                            per-buffer or per-channel basis.
+
+      RING_BUFFER_NONE:     No output provided by the library is used.
+
+  * backend:
+
+      RING_BUFFER_PAGE:     The memory backend used to hold the ring buffers is
+                            made of non-contiguous pages. A software-controlled
+                            "subbuffer table" indexes the pages. It allows
+                            sub-buffer exchange between the producer and
+                            consumer in overwrite mode.
+
+      RING_BUFFER_VMAP:     A vmap'd virtually contiguous memory area is used as
+                            memory backend. (unimplemented)
+
+      RING_BUFFER_STATIC:   A physically contiguous memory area is used as
+                            memory backend. e.g. memory allocated at early boot,
+                            or video card memory. (unimplemented)
+
+  * oops:
+        Select "oops" consistency if you plan to read from the ring buffer
+        after a kernel oops occurred. This is useful if you plan to use the
+        ring buffer data in a crash report. Adds a slight performance overhead
+        to keep track of how much contiguous data has been written in the
+        current subbuffer.
+
+  * ipi:
+        The IPI_BARRIER scheme issues IPIs when the consumer needs to grab a
+        sub-buffer. It issues the appropriate memory barriers on the writer
+        CPU(s). It is therefore possible to turn the memory barrier in the
+        commit fast-path into a simple compiler barrier, thus improving
+        performances. This scheme is recommended when both per-cpu allocation
+        and synchronization are used. This scheme is not recommended for
+        "global" buffers, because it would involve sending IPIs to all
+        processors.
+
+  * wakeup:
+        The option "RING_BUFFER_WAKEUP_BY_TIMER" reduces intrusiveness in
+        the writer code and guarantees wait-free/lock-free write primitives
+        by performing lazy reader wakeups in a periodic deferrable timer and
+        hooking into cpu idle notifiers. This option makes tracer code more
+        robust at the expense of additional data delivery delay.
+        Use in combination with "read_timer_interval" channel_create()
+        argument.
+                - Note: CPU idle notifiers are not implemented for all
+                  architectures at the moment. The deferrable timer delays can
+                  only expected to be met by architectures with idle notifiers.
+       RING_BUFFER_WAKEUP_BY_WRITER option specifies that the ring buffer
+       write-side must perform reader wakeups at each sub-buffer boundary.
+       RING_BUFFER_WAKEUP_NONE does not perform any wakeup whatsoever. The
+       client has the responsibility to perform wakeups.
+
+  * tsc_bits:
+        Timestamp compression scheme setting. 0 means that no timestamps
+        are used; 64 means that full 64-bit timestamps are written with
+        each record. For any value between 1 and 63, the ring buffer
+        library will set the RING_BUFFER_RFLAG_FULL_TSC bit in the
+        "rflags" ring_buffer_ctx field, which is also passed as parameter
+        passed to the "record_header_size()" callback to inform the client
+        that a full 64-bit timestamp is needed due to a "tsc_bits"
+        overflow since the last record.
+
+Some options are passed as parameter to channel_create():
+
+  * subbuf_size:
+        Size of a sub-buffer within a ring buffer. Extra synchronization is
+	performed when the data producer crosses sub-buffer boundaries. This
+        corresponds to "periods" in audio buffers. The maximum record size is
+        limited by the sub-buffer size. The minimum sub-buffer size is 1 page.
+
+  * num_subbuf:
+        Number of sub-buffers per buffer. Typically, using at least 2
+        sub-buffers is recommended to minimize record discards.
+
+  * switch_timer_interval:
+        The switch timer interval configures the periodical deferrable
+        timer which handles periodical buffer switch. It is used to make
+        data readily available for consumption periodically for live data
+	streaming. A buffer switch is a synchronization point between the data
+        producers and consumer.
+
+  * read_timer_interval:
+        The read timer interval is the time interval (in us) to wake up pending
+        readers.
+
+* Advanced client callbacks
+
+  These callbacks are configured by the cb field of the ring_buffer_config
+structure. They are provided to the ring buffer by the client. For both
+ring_buffer_clock_read() and record_header_size(), inline versions must also be
+provided before inclusion of linux/ringbuffer/api.h.
+
+  * ring_buffer_clock_read():
+        Returns the current ring buffer clock source time (64-bit value).
+
+  * record_header_size():
+        Returns the size of the current record size, including record header
+        size. It uses the "rflags" parameter to determine if a full 64-bit
+        timestamp is required or if "tsc_bits" bits are enough to represent the
+        current time and detect "tsc_bits"-bit overflow. The offset received as
+        parameter is relative to a page boundary, which allows alignment
+        calculation. data_size is the size of the event payload.
+        "pre_header_padding" can be set by record_header_size() to the amount of
+        padding required to align the record header (considered to be 0 if
+        unset).
+
+  * subbuffer_header_size():
+        Returns the size of the subbuffer header.
+
+  * buffer_begin():
+        Callback executed when crossing a sub-buffer boundary, when starting to
+        write into the sub-buffer.
+
+  * buffer_end():
+        Callback executed when crossing a sub-buffer boundary, before delivering
+        a sub-buffer. Has exclusive sub-buffer access when called; meaning that
+        no concurrent commits are left, no reader can access the sub-buffer, no
+        concurrent writers are allowed to overwrite the sub-buffer.
+
+  * buffer_create():
+        This callback is executed upon creation of a buffer, either at channel
+        creation, or at CPU hotplug.
+
+  * buffer_finalize():
+        Callback executed upon channel finalize, performed by channel_destroy().
+
+  * record_get():
+        Reader helper provided by the client, which can be used to extract the
+        record header from a record in the buffer.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/