linux-kernel - Re: [PATCH 3/7] barrier: a scalable synchonisation barrier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070201160010.GA121@tv-sign.ru>
Date:	Thu, 1 Feb 2007 19:00:10 +0300
From:	Oleg Nesterov <oleg@...sign.ru>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>,
	Christoph Hellwig <hch@...radead.org>,
	Andrew Morton <akpm@...l.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/7] barrier: a scalable synchonisation barrier

On 01/31, Paul E. McKenney wrote:
> 
> QRCU as currently written (http://lkml.org/lkml/2006/11/29/330) doesn't
> do what you want, as it acquires the lock unconditionally.  I am proposing
> that synchronize_qrcu() change to something like the following:
> 
> 	void synchronize_qrcu(struct qrcu_struct *qp)
> 	{
> 		int idx;
> 	
> 		smp_mb();
> 	
> 		if (atomic_read(qp->ctr[0]) + atomic_read(qp->ctr[1]) <= 1) {
> 			smp_rmb();
> 			if (atomic_read(qp->ctr[0]) +
> 			    atomic_read(qp->ctr[1]) <= 1)
> 				goto out;
> 		}
> 	
> 		mutex_lock(&qp->mutex);
> 		idx = qp->completed & 0x1;
> 		atomic_inc(qp->ctr + (idx ^ 0x1));
> 		/* Reduce the likelihood that qrcu_read_lock() will loop */
> 		smp_mb__after_atomic_inc();
> 		qp->completed++;
> 	
> 		atomic_dec(qp->ctr + idx);
> 		__wait_event(qp->wq, !atomic_read(qp->ctr + idx));
> 		mutex_unlock(&qp->mutex);
> 	out:
> 		smp_mb();
> 	}
> 
> For the first "if" to give a false positive, a concurrent switch had
> to have happened.  For example, qp->ctr[0] was zero and qp->ctr[1]
> was two at the time of the first atomic_read(), but then qp->completed
> switched so that both qp->ctr[0] and qp->ctr[1] were one at the time
> of the second atomic_read.  The only way the second "if" can give us a
> false positive is if there was another change to qp->completed in the
> meantime -- but that means that all of the pre-existing qrcu_read_lock()
> holders must have gotten done, otherwise the second switch could not
> have happened.  Yes, you do incur three memory barriers on the fast
> path, but the best you could hope for with your approach was two of them
> (unless I am confused about how you were using barrier_sync()).

While doing qrcu, somehow I convinced myself we can't optimize out taking
qp->mutex. Now I think I was wrong. Good!

Q: you deleted "if (atomic_read(qp->ctr + idx) == 1)" fastpath under ->mutex,
was this needed for this optimization to work? I am asking because I can't
understand how it can make any difference.

> Oleg, does this look safe?

Yes. But let me think more about this later, I've got a fever, and can't
think properly today :)

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/