lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 7 May 2011 09:11:51 +0200
From:	"Henrik Rydberg" <rydberg@...omail.se>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	john stultz <johnstul@...ibm.com>,
	Andi Kleen <andi@...stfloor.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	lkml <linux-kernel@...r.kernel.org>,
	Paul Mackerras <paulus@...ba.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Anton Blanchard <anton@...ba.org>, Ingo Molnar <mingo@...e.hu>
Subject: Re: [RFC] time: xtime_lock is held too long

Hi Eric,

> > Defeating seqlock power? My thoughts are that seqlocks are nice
> > lightweight reader/writer locks that avoid writer starvation. You seem
> > to be trying to redefine or extend them to be something else which is
> > more subtle. 
> > 
> 
> All I am trying to explain is that a seqlock is a compound of two
> things : One spinlock to synchronize writers among themselves, one
> seqcount to synchronize readers with a writer.
> 
> But the API provides only a compound one. Writer uses the whole locking,
> while it would be nice to be able to separate the two steps. Note that
> because write_seqlock() and write_sequnlock() are inlined, this would
> not increase text size.

The separate writer and reader idea also came up in the input
subsystem about a year ago, to adress the one-writer-many-readers
situation. The patch went a few rounds on lkml
(https://lkml.org/lkml/2010/6/20/87), but the final version included
below got dropped because of an unrelated problem with the (evdev)
usecase. Perhaps it is useful in this setting?

Cheers,
Henrik

>From bd26d8972130978deb9056a7a6da30b5c049914f Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <rydberg@...omail.se>
Date: Fri, 25 Jun 2010 01:59:16 +0200
Subject: [PATCH] Introduce mrbuf, a multi-reader buffer mechanism

In spite of the many lock patterns and fifo helpers in the kernel, the
case of a single writer feeding many readers via a circular event
buffer seems to be uncovered. This patch adds mrbuf, a mechanism
for handling multiple concurrent read positions in a shared circular
buffer.  Under normal operation, given adequate buffer size, the
operation is lock-less.
---
 include/linux/mrbuf.h |  228 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 228 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/mrbuf.h

diff --git a/include/linux/mrbuf.h b/include/linux/mrbuf.h
new file mode 100644
index 0000000..86ab656
--- /dev/null
+++ b/include/linux/mrbuf.h
@@ -0,0 +1,228 @@
+#ifndef _MRBUF_H
+#define _MRBUF_H
+/*
+ * Multi-reader buffer lock for single writer, concurrent readers
+ *
+ * Copyright (c) 2010 Henrik Rydberg
+ *
+ * The mrbuf is a mechanism for handling multiple concurrent read
+ * positions in a shared circular buffer. The mrbuf does not handle
+ * the actual buffer, which can therefore be of any type.
+ *
+ * The writer does not block, but overwrites older events as the
+ * buffer gets full. Writing is performed in two stages; writing the
+ * data and telling the readers about it (syncing).
+ *
+ * The readers all read from the same shared buffer, but at individual
+ * positions. Reading from the position currently being written to
+ * will cause the reader to retry until the read is coherent. During
+ * normal operation, with sufficient buffer size, the mechanism is
+ * practically lock-less.
+ *
+ * The mrbuf is particularly suitable for input event buffers, where
+ * the single writer must not be starved, and where there are several
+ * threads reading the same data.
+ *
+ * Multi-reader buffer locks are similar to seqlocks, but while
+ * seqlocks have a finite retry frequency, mrbufs virtually never
+ * retry. Like seqlocks, mrbufs are not very cache friendly, and
+ * require the buffer to be valid in all threads.
+ *
+ * Multiple writers and re-entrant readers require additional locking.
+ *
+ * Event queue writer example.  Readers are synchronized individually.
+ *
+ *    mrbuf_write_lock(&writer);
+ *
+ *    buffer[mrbuf_write_at(&writer)] = event_to_write;
+ *
+ *    mrbuf_write_unlock(&writer);
+ *
+ *    for (i = 0; i < NUM_READERS; i++)
+ *            mrbuf_write_sync(&writer, &client[i]->reader);
+ *
+ * Event queue reader example, reading all available events.
+ *
+ *    while (!mrbuf_read_empty(&reader)) {
+ *            do {
+ *                    mrbuf_read_try(&reader, &writer);
+ *                    event_to_read = buffer[mrbuf_read_at(&reader, &writer)];
+ *            } while (mrbuf_read_retry(&reader, &writer));
+ *    }
+ *
+ */
+
+/**
+ * struct mrbuf_writer - mrbuf writer
+ * @page: the bits of the circular buffer (page = size - 1)
+ * @head: the current writer head
+ * @next: the head to take effect after the lock
+ *
+ * The buffer size must be a power of two, such that page is the set
+ * of bits in which the buffer positions live.
+ *
+ */
+struct mrbuf_writer {
+	unsigned int page;
+	unsigned int head;
+	unsigned int next;
+};
+
+/**
+ * struct mrbuf_reader - mrbuf reader
+ * @last: the last write head seen beforing trying
+ * @head: the current reader head
+ * @tail: the current reader tail
+ */
+struct mrbuf_reader {
+	unsigned int last;
+	unsigned int head;
+	unsigned int tail;
+};
+
+/**
+ * mrbuf_write_init - initialize writer
+ * @bw: the mrbuf_writer to initialize
+ * @size: the size of the buffer (must be power of two)
+ */
+static inline void mrbuf_write_init(struct mrbuf_writer *bw, unsigned int size)
+{
+	bw->page = size - 1;
+	bw->head = 0;
+	bw->next = 0;
+}
+
+/**
+ * mrbuf_write_lock - lock to write one event
+ * @bw: the mrbuf_writer to lock
+ *
+ * This method prepares to write one event.  The write lock does not
+ * sleep and is thus suitable for use in atomic contexts.
+ *
+ */
+static inline void mrbuf_write_lock(struct mrbuf_writer *bw)
+{
+	++bw->next;
+	smp_wmb();
+}
+
+/**
+ * mrbuf_write_at - return position to write to
+ * @bw: the mrbuf_writer keeping track of the write position
+ *
+ * Returns the buffer position to write to. Must be called from within
+ * the write lock.
+ *
+ */
+static inline unsigned int mrbuf_write_at(const struct mrbuf_writer *bw)
+{
+	return bw->head & bw->page;
+}
+
+/**
+ * mrbuf_write_unlock - unlock writing
+ * @bw: the mrbuf_writer to unlock
+ */
+static inline void mrbuf_write_unlock(struct mrbuf_writer *bw)
+{
+	smp_wmb();
+	++bw->head;
+}
+
+/**
+ * mrbuf_write_sync - synchronize reader with current writer
+ * @bw: the mrbuf_writer to sync with
+ * @br: the mrbuf_reader to sync
+ *
+ * Synchronize the reader head with the writer head, effectively
+ * telling the reader thread that there is new data to read.
+ *
+ */
+static inline void mrbuf_write_sync(const struct mrbuf_writer *bw,
+				    struct mrbuf_reader *br)
+{
+	br->head = bw->head;
+}
+
+/**
+ * mrbuf_read_init - initialize reader
+ * @br: the mrbuf_reader to initialize
+ * @head: the initial reader head
+ * @tail: the initial reader tail
+ *
+ * This function must be called while mrbuf_write_sync() and
+ * mrbuf_read_empty() are out of reach, since the reader state is updated
+ * without coherence guarantees.
+ */
+static inline void mrbuf_read_init(struct mrbuf_reader *br,
+				   unsigned int head, unsigned int tail)
+{
+	br->head = head;
+	br->tail = tail;
+	smp_wmb();
+}
+
+/**
+ * mrbuf_read_empty - true if reader is empty
+ * @br: the mrbuf_reader to check
+ */
+static inline bool mrbuf_read_empty(const struct mrbuf_reader *br)
+{
+	return br->head == br->tail;
+}
+
+/**
+ * mrbuf_read_try - try to read data
+ * @br: the mrbuf_reader to try to read from
+ * @bw: the mrbuf_writer keeping track of the write position
+ *
+ * Prepare to read from buffer. The reader must be non-empty before
+ * trying to read from it. If the reader has fallen behind, it catches
+ * up by jumping to the last page being written to.
+ *
+ */
+static inline void mrbuf_read_try(struct mrbuf_reader *br,
+				  const struct mrbuf_writer *bw)
+{
+	br->last = bw->head;
+	smp_rmb();
+	br->tail += (br->head - 1 - br->tail) & ~bw->page;
+}
+
+/**
+ * mrbuf_read_at - return position to read from
+ * @br: the mrbuf_reader keeping track of the read position
+ * @bw: the mrbuf_writer keeping track of the buffer size
+ *
+ * Returns the buffer position to read from. Must be called from
+ * within the read try block.
+ *
+ */
+static inline unsigned int mrbuf_read_at(const struct mrbuf_reader *br,
+					 const struct mrbuf_writer *bw)
+{
+	return br->tail & bw->page;
+}
+
+/**
+ * mrbuf_read_retry - try to finish read
+ * @br: the mrbuf_reader to try to finish
+ * @bw: the mrbuf_writer keeping track of the write position
+ *
+ * Try to finish the read. Returns false if successful. Otherwise, the
+ * read was incoherent and one must mrbuf_read_try() again.  During
+ * normal operation, with adequate buffer size, this method will
+ * always return false.
+ *
+ */
+static inline bool __must_check mrbuf_read_retry(struct mrbuf_reader *br,
+						 const struct mrbuf_writer *bw)
+{
+	smp_rmb();
+	if (unlikely(((br->tail - br->last) & bw->page) < bw->next - br->last))
+		return true;
+	++br->tail;
+	return false;
+}
+
+#endif /* _MRBUF_H */
-- 
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ