linux-kernel - Re: [PATCH RFC ticketlock] Auto-queued ticketlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130611150032.GA1762@linux.vnet.ibm.com>
Date:	Tue, 11 Jun 2013 08:00:32 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
	dipankar@...ibm.com, akpm@...ux-foundation.org,
	mathieu.desnoyers@...icios.com, josh@...htriplett.org,
	niv@...ibm.com, tglx@...utronix.de, peterz@...radead.org,
	Valdis.Kletnieks@...edu, dhowells@...hat.com, edumazet@...gle.com,
	darren@...art.com, fweisbec@...il.com, sbw@....edu,
	torvalds@...ux-foundation.org
Subject: Re: [PATCH RFC ticketlock] Auto-queued ticketlock

On Tue, Jun 11, 2013 at 02:56:55AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 10, 2013 at 08:44:40PM -0400, Steven Rostedt wrote:
> > On Sun, 2013-06-09 at 12:36 -0700, Paul E. McKenney wrote:

[ . . . ]

> > > +bool tkt_q_do_spin(arch_spinlock_t *asp, struct __raw_tickets inc)
> > > +{
> > > +	struct tkt_q **oldtail;
> > > +	struct tkt_q tq;
> > > +	struct tkt_q_head *tqhp;
> > > +
> > > +	/*
> > > +	 * Ensure that accesses to queue header happen after sensing
> > > +	 * the lock's have-queue bit.
> > > +	 */
> > > +	smp_mb();  /* See above block comment. */
> > > +
> > > +	/* If there no longer is a queue, leave. */
> > > +	tqhp = tkt_q_find_head(asp);
> > > +	if (tqhp == NULL)
> > > +		return false;
> > > +
> > > +	/* Initialize our queue element. */
> > > +	tq.cpu = raw_smp_processor_id();
> > > +	tq.tail = inc.tail;
> > > +	tq.next = NULL;
> > > +
> > > +	/* Check to see if we already hold the lock. */
> > > +	if (ACCESS_ONCE(tqhp->head_tkt) == inc.tail) {
> > > +		/* The last holder left before queue formed, we hold lock. */
> > > +		tqhp->head_tkt = -1;
> > > +		return true;
> > > +	}
> > > +
> > > +	/* Add our element to the tail of the queue. */
> > > +	oldtail = xchg(&tqhp->spin_tail, &tq.next);
> > 
> > Boy this is tricky code! I thought I found a race window here, but as I
> > went to write my email saying "Gotcha!" I found that it wasn't a race
> > after all. But as I went though the effort of writing this, I figured I
> > would send this out as documentation for others to see. Hmm, I wonder if
> > we can use this email to add more comments. Anyway, here's what I
> > thought was wrong ;-)
> 
> If you didn't know any better, you might even think that I had done
> something like this before.  ;-)
> 
> > OK, I originally thought there was a race window here. Let's say that an
> > NMI hit right here, and it happens to be a big one, where lots of things
> > can happen on other CPUs right now.
> > 
> > The scenario is that there's just one item on the queue, which is
> > waiting for the lock to be released, and is spinning below in the:
> > 
> > 	while (ACCESS_ONCE(tq.cpu) != -1)
> > 		cpu_relax();
> > 
> > And then the lock is released, where in tkt_q_do_wake() the following is
> > called:
> > 
> > 	ACCESS_ONCE(tqp->cpu) = -1;
> > 
> > Now the old queued task is released. But it's tq->next hasn't been set
> > yet, and is still NULL. It leaves by doing:
> > 
> > 	ACCESS_ONCE(tqhp->spin) = tq.next;
> > 	return true;
> > 
> > All before this task gets to set *oldtail to &tq. But, I then looked
> > below...
> > 
> > 
> > > +	ACCESS_ONCE(*oldtail) = &tq;
> > > +
> > > +	/* Spin until handoff. */
> > > +	while (ACCESS_ONCE(tq.cpu) != -1)
> > > +		cpu_relax();
> > > +
> > > +	/*
> > > +	 * Remove our element from the queue.  If the queue is now empty,
> > > +	 * update carefully so that the next acquisition will queue itself
> > > +	 * at the head of the list.
> > > +	 */
> > > +	if (tq.next == NULL) {
> > 
> > This checks for that scenario.
> 
> Yep!
> 
> >                                As if the old task were to come out
> > spinning, the problem would only be if it was the last one on the list,
> > and its tq.next was NULL. But if that was the case, then we set spin to
> > NULL and do the next trick, where I thought I gotcha again...
> > 
> > 
> > > +
> > > +		/* Mark the queue empty. */
> > > +		tqhp->spin = NULL;
> > > +
> > > +		/* Try to point the tail back at the head. */
> > > +		if (cmpxchg(&tqhp->spin_tail,
> > > +			    &tq.next,
> > > +			    &tqhp->spin) == &tq.next)
> > 
> > Here, I was thinking, oh wait, what happens if this is called right
> > before the xchg() above. Then we would set spin_tail but not update the
> > old tq.next. But wait! look at what we assign spin_tail to. It's the
> > address of spin, which would be what oldtail would point to above, and
> > then above would set spin to the new tq!
> 
> Yep again!
> 
> > OK, I haven't found a issue here yet, but youss are beiing trickssy! We
> > don't like trickssy, and we must find precccciouss!!!
> > 
> > 
> > This code is starting to make me look like Gollum :-p
> 
> Hmmm...  The time and effort to do this might almost have been worthwhile
> just to accomplish that!  ;-)
> 
> But yes, this would need better comments, design documentation, or
> maybe both.

And for whatever it might be worth, here is an attempted upgrade for
comments.

First, I upgrade the comment for the xchg() that does the enqueue:

	/*
	 * Add our element to the tail of the queue.  Note that if the
	 * queue is empty, the ->spin_tail pointer will reference
	 * the queue's head pointer, namely ->spin.
	 */
	oldtail = xchg(&tqhp->spin_tail, &tq.next);
	ACCESS_ONCE(*oldtail) = &tq;

Next, I upgrade the comment for the dequeue operation:

	/*
	 * Remove our element from the queue.  If the queue is now empty,
	 * update carefully so that the next acquisition will enqueue itself
	 * at the head of the list.  Of course, the next enqueue operation
	 * might be happening concurrently, and this code needs to handle all
	 * of the possible combinations, keeping in mind that the enqueue
	 * operation happens in two stages: (1) update the tail pointer and
	 * (2) update the predecessor's ->next pointer.  With this in mind,
	 * the following code needs to deal with three scenarios:
	 *
	 * 1.	tq is the last entry.  In this case, we use cmpxchg to
	 *	point the list tail back to the list head (->spin).  If
	 * 	the cmpxchg fails, that indicates that we are instead
	 *	in scenario 2 below.  If the cmpxchg succeeds, the next
	 *	enqueue operation's tail-pointer exchange will enqueue
	 *	the next element at the queue head, because the ->spin_tail
	 *	pointer now references the queue head.
	 *
	 * 2.	tq is the last entry, and the next entry has updated the
	 *	tail pointer but has not yet updated tq.next.  In this
	 *	case, tq.next is NULL, the cmpxchg will fail, and the
	 *	code will wait for the enqueue to complete before completing
	 *	removal of tq from the list.
	 *
	 * 3.	tq is not the last pointer.  In this case, tq.next is non-NULL,
	 *	so the following code simply removes tq from the list.
	 */
	if (tq.next == NULL) {

		/* Mark the queue empty. */
		tqhp->spin = NULL;

		/* Try to point the tail back at the head. */
		if (cmpxchg(&tqhp->spin_tail,
			    &tq.next,
			    &tqhp->spin) == &tq.next)
			return true; /* Succeeded, queue is now empty. */

		/* Failed, if needed, wait for the enqueue to complete. */
		while (tq.next == NULL)
			cpu_relax();

		/* The following code will repair the head. */
	}
	smp_mb(); /* Force ordering between handoff and critical section. */

	/*
	 * Advance list-head pointer.  This same task will be the next to
	 * access this when releasing the lock, so no need for a memory
	 * barrier after the following assignment.
	 */
	ACCESS_ONCE(tqhp->spin) = tq.next;
	return true;
}

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/