linux-kernel - Re: [ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090212161805.GB6759@linux.vnet.ibm.com>
Date:	Thu, 12 Feb 2009 08:18:05 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Mathieu Desnoyers <compudj@...stal.dyndns.org>
Cc:	ltt-dev@...ts.casi.polymtl.ca, linux-kernel@...r.kernel.org
Subject: Re: [ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux
	(repost)

On Thu, Feb 12, 2009 at 12:47:07AM -0500, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck@...ux.vnet.ibm.com) wrote:
> > On Wed, Feb 11, 2009 at 11:10:44PM -0500, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney (paulmck@...ux.vnet.ibm.com) wrote:
> > > > On Wed, Feb 11, 2009 at 06:33:08PM -0800, Paul E. McKenney wrote:
> > > > > On Wed, Feb 11, 2009 at 04:35:49PM -0800, Paul E. McKenney wrote:
> > > > > > On Wed, Feb 11, 2009 at 04:42:58PM -0500, Mathieu Desnoyers wrote:
> > > > > > > * Paul E. McKenney (paulmck@...ux.vnet.ibm.com) wrote:
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > > > (BTW, I do not trust my model yet, as it currently cannot detect the
> > > > > > > > failure case I pointed out earlier.  :-/  Here and I thought that the
> > > > > > > > point of such models was to detect additional failure cases!!!)
> > > > > > > > 
> > > > > > > 
> > > > > > > Yes, I'll have to dig deeper into it.
> > > > > > 
> > > > > > Well, as I said, I attached the current model and the error trail.
> > > > > 
> > > > > And I had bugs in my model that allowed the rcu_read_lock() model
> > > > > to nest indefinitely, which overflowed into the top bit, messing
> > > > > things up.  :-/
> > > > > 
> > > > > Attached is a fixed model.  This model validates correctly (woo-hoo!).
> > > > > Even better, gives the expected error if you comment out line 180 and
> > > > > uncomment line 213, this latter corresponding to the error case I called
> > > > > out a few days ago.
> > > > > 
> > > > > I will play with removing models of mb...
> > > > 
> > > > And commenting out the models of mb between the counter flips and the
> > > > test for readers still passes validation, as expected, and as shown in
> > > > the attached Promela code.
> > > > 
> > > 
> > > Hrm, in the email I sent you about the memory barrier, I said that it
> > > would not make the algorithm incorrect, but that it would cause
> > > situations where it would be impossible for the writer to do any
> > > progress as long as there are readers active. I think we would have to
> > > enhance the model or at least express this through some LTL statement to
> > > validate this specific behavior.
> > 
> > But if the writer fails to make progress, then the counter remains at a
> > given value, which causes readers to drain, which allows the writer to
> > eventually make progress again.  Right?
> > 
> 
> Not necessarily. If we don't have the proper memory barriers, we can
> have the writer waiting on, say, parity 0 *before* it has performed the
> parity switch. Therefore, even newly coming readers will add up to
> parity 0.

But the write that changes the parity will eventually make it out.
OK, so your argument is that we at least need a compiler barrier?

Regardless, please see attached for a modified version of the Promela
model that fully models omitting out the memory barrier that my
rcu_nest32.[hc] implementation omits.  (It is possible to partially
model removal of other memory barriers via #if 0, but to fully model
would need to enumerate the permutations as shown on lines 231-257.)

> In your model, this is not detected, because eventually all readers will
> execute, and only then the writer will be able to update the data. But
> in reality, if we run a very busy 4096-cores machines where there is
> always at least one reader active, the the writer will be stuck forever,
> and that's really bad.

Assuming that the reordering is done by the CPU, the write will
eventually get out -- it is stuck in (say) the store buffer, and the
cache line will eventually arrive, and then the value will eventually
be seen by the readers.

We might need a -compiler- barrier, but then again, I am not sure that
we are talking about the same memory barrier -- again, please see
attached lines 231-257 to see which one that I eliminated.

Also, the original model I sent out has a minor bug that prevents it
from fully modeling the nested-read-side case.  The patch below fixes this.

Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
---

 urcu.spin |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/formal-model/urcu.spin b/formal-model/urcu.spin
index e5bfff3..611464b 100644
--- a/formal-model/urcu.spin
+++ b/formal-model/urcu.spin
@@ -124,9 +124,13 @@ proctype urcu_reader()
 				break;
 			:: tmp < 4 && reader_progress[tmp] != 0 ->
 				tmp = tmp + 1;
-			:: tmp >= 4 ->
+			:: tmp >= 4 &&
+			   reader_progress[0] == reader_progress[3] ->
 				done = 1;
 				break;
+			:: tmp >= 4 &&
+			   reader_progress[0] != reader_progress[3] ->
+			   	break;
 			od;
 			do
 			:: tmp < 4 && reader_progress[tmp] == 0 ->

View attachment "urcu_mbmin.spin" of type "text/plain" (7514 bytes)

Download attachment "urcu_mbmin.sh" of type "application/x-sh" (57 bytes)