lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 30 Oct 2013 12:25:26 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Victor Kaplansky <VICTORK@...ibm.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Anton Blanchard <anton@...ba.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Linux PPC dev <linuxppc-dev@...abs.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	Michael Ellerman <michael@...erman.id.au>,
	Michael Neuling <mikey@...ling.org>
Subject: Re: perf events ring buffer memory barrier on powerpc

On Wed, Oct 30, 2013 at 02:27:25AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 28, 2013 at 10:58:58PM +0200, Victor Kaplansky wrote:
> > Oleg Nesterov <oleg@...hat.com> wrote on 10/28/2013 10:17:35 PM:
> > 
> > >       mb();   // XXXXXXXX: do we really need it? I think yes.
> > 
> > Oh, it is hard to argue with feelings. Also, it is easy to be on
> > conservative side and put the barrier here just in case.
> > But I still insist that the barrier is redundant in your example.
> 
> If you were to back up that insistence with a description of the orderings
> you are relying on, why other orderings are not important, and how the
> important orderings are enforced, I might be tempted to pay attention
> to your opinion.

OK, so let me try.. a slightly less convoluted version of the code in
kernel/events/ring_buffer.c coupled with a userspace consumer would look
something like the below.

One important detail is that the kbuf part and the kbuf_writer() are
strictly per cpu and we can thus rely on implicit ordering for those.

Only the userspace consumer can possibly run on another cpu, and thus we
need to ensure data consistency for those. 

struct buffer {
	u64 size;
	u64 tail;
	u64 head;
	void *data;
};

struct buffer *kbuf, *ubuf;

/*
 * Determine there's space in the buffer to store data at @offset to
 * @head without overwriting data at @tail.
 */
bool space(u64 tail, u64 offset, u64 head)
{
	offset = (offset - tail) % kbuf->size;
	head   = (head   - tail) % kbuf->size;

	return (s64)(head - offset) >= 0;
}

/*
 * If there's space in the buffer; store the data @buf; otherwise
 * discard it.
 */
void kbuf_write(int sz, void *buf)
{
	u64 tail = ACCESS_ONCE(ubuf->tail); /* last location userspace read */
	u64 offset = kbuf->head; /* we already know where we last wrote */
	u64 head = offset + sz;

	if (!space(tail, offset, head)) {
		/* discard @buf */
		return;
	}

	/*
	 * Ensure that if we see the userspace tail (ubuf->tail) such
	 * that there is space to write @buf without overwriting data
	 * userspace hasn't seen yet, we won't in fact store data before
	 * that read completes.
	 */

	smp_mb(); /* A, matches with D */

	write(kbuf->data + offset, buf, sz);
	kbuf->head = head % kbuf->size;

	/*
	 * Ensure that we write all the @buf data before we update the
	 * userspace visible ubuf->head pointer.
	 */
	smp_wmb(); /* B, matches with C */

	ubuf->head = kbuf->head;
}

/*
 * Consume the buffer data and update the tail pointer to indicate to
 * kernel space there's 'free' space.
 */
void ubuf_read(void)
{
	u64 head, tail;

	tail = ACCESS_ONCE(ubuf->tail);
	head = ACCESS_ONCE(ubuf->head);

	/*
	 * Ensure we read the buffer boundaries before the actual buffer
	 * data...
	 */
	smp_rmb(); /* C, matches with B */

	while (tail != head) {
		obj = ubuf->data + tail;
		/* process obj */
		tail += obj->size;
		tail %= ubuf->size;
	}

	/*
	 * Ensure all data reads are complete before we issue the
	 * ubuf->tail update; once that update hits, kbuf_write() can
	 * observe and overwrite data.
	 */
	smp_mb(); /* D, matches with A */

	ubuf->tail = tail;
}


Now the whole crux of the question is if we need barrier A at all, since
the STORES issued by the @buf writes are dependent on the ubuf->tail
read.

If the read shows no available space, we simply will not issue those
writes -- therefore we could argue we can avoid the memory barrier.

However, that leaves D unpaired and me confused. We must have D because
otherwise the CPU could reorder that write into the reads previous and
the kernel could start overwriting data we're still reading.. which
seems like a bad deal.

Also, I'm not entirely sure on C, that too seems like a dependency, we
simply cannot read the buffer @tail before we've read the tail itself,
now can we? Similarly we cannot compare tail to head without having the
head read completed.


Could we replace A and C with an smp_read_barrier_depends()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ