lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100513171015.GA13698@Krystal>
Date:	Thu, 13 May 2010 13:10:15 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Pierre Tardy <tardyp@...il.com>, Ingo Molnar <mingo@...e.hu>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Tom Zanussi <tzanussi@...il.com>,
	Paul Mackerras <paulus@...ba.org>,
	linux-kernel@...r.kernel.org, arjan@...radead.org,
	ziga.mahkovec@...il.com, davem <davem@...emloft.net>
Subject: Re: Perf and ftrace [was Re: PyTimechart]

* Steven Rostedt (rostedt@...dmis.org) wrote:
> On Thu, 2010-05-13 at 12:31 -0400, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@...dmis.org) wrote:
> > > On Thu, 2010-05-13 at 09:20 -0400, Mathieu Desnoyers wrote:
> > [...]
> > > > > > 
> > > > > > ...
> > > > > > 
> > > > > > 97 /**
> > > > > > 98  * ring_buffer_clear_noref_flag - Clear the noref subbuffer flag, for writer.
> > > > > > 99  */
> > > > > > 100 static __inline__
> > > > > > 101 void ring_buffer_clear_noref_flag(struct ring_buffer_backend *bufb,
> > > > > > 102                                   unsigned long idx)
> > > > > > 103 {
> > > > > > 104         struct ring_buffer_backend_page *sb_pages, *new_sb_pages;
> > > > > > 105
> > > > > > 106         sb_pages = bufb->buf_wsb[idx].pages;
> > > > > > 107         for (;;) {
> > > > > > 108                 if (!RCHAN_SB_IS_NOREF(sb_pages))
> > > > > > 109                         return; /* Already writing to this buffer */
> > > > > > 110                 new_sb_pages = sb_pages;
> > > > > > 111                 RCHAN_SB_CLEAR_NOREF(new_sb_pages);
> > > > > > 112                 new_sb_pages = cmpxchg(&bufb->buf_wsb[idx].pages,
> > > > > > 113                         sb_pages, new_sb_pages);
> > > > > > 114                 if (likely(new_sb_pages == sb_pages))
> > > > > > 115                         break;
> > > > > > 116                 sb_pages = new_sb_pages;
> > > > > 
> > > > > The writer calls this??
> > > > 
> > > > Yes. But the common case (for each event) is simply a
> > > > "if (!RCHAN_SB_IS_NOREF(sb_pages))" test that returns. The cmpxchg() is only
> > > > performed at subbuffer boundary.
> > > 
> > > Is the cmpxchg only contending with other writers?
> > 
> > No. Would have this been the case, I would have used a cmpxchg_local(). This
> > cmpxchg used to deal with subbuffer swap is touching the subbuffer "pages"
> > pointer, which can be updated concurrently by other writers as well as readers.
> > 
> > The writer clears the noref flags when starting to write in a subbuffers, and
> > sets it when delivering the subbuffer (when it is fully committed).
> > 
> > The reader can only ever swap the subbuffer with the one it owns if the noref
> > flag is set. The reader uses a cmpxchg() too to perform the swap.
> > 
> 
> So the writer can loop because of a reader? And theoretically, a
> aggressive reader that is constantly swapping pages can halt a writer?

Hehe, I'm glad you ask :) This has been one of my major concerns with this
algorithm.

Taken from my thesis:

8.6 Formal Verification of LTTng
8.6.3 Real-Time Impact

"The second synchronization added by the overwrite mode involves letting the
reader thread exchange individual sub-buffers with its own extra sub-buffer
before reading them. This exchange is performed with a CAS instruction which
verified if the use flag is set by the writer. If the sub-buffer is in use, the
reader will retry later.  On the writer side, a CAS instruction is used in a
busy-loop to set and clear the use flag (for portability, because an atomic
instruction to set a mask is not available for all architecture in the Linux
kernel). Given that the reader can only ever exchange a sub-buffer and cause the
writer to restart once (further accesses to that sub-buffers are impossible,
because the reader will block on the writer position), the reader thread cannot
starve the writer."

IOW, if the reader is trying to get subbuffers like crazy, it will maybe cause
one loop restart when taking the subbuffer the writer was about to write to, but
after that, the reader will be forced to exchange the next subbuffer, and so on
until it wraps-around and comes back to the subbuffer preceding the writer
position, as stop there. Note that during all this execution, the reader could
only cause a single retry of the writer. In summary, the frontend range
restrictions guarantee that the writer will retry at most once.

Note that we could also use an atomic "set bit/clear bit" for the writer, along
with write memory barriers, when available on the architecture, and that would
remove all need for loops. However, architectures that need to emulate set
bit/clear bit with a cmpxchg() would behave pretty much in the same way as my
current code does.

> 
> I thought you said you did not have loops in the writers?

Well, while technically a loop, it is a bounded single-retry, so it's OK as far
as wait-free guarantees are concerned. We could even put a WARN_ON() there to
ensure that we catch double retry that should never happen, but that would be at
the cost of a runtime veritication. A DEBUG_TRACER_RT config option could select
this though.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ