[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171012112718.GA31036@arm.com>
Date: Thu, 12 Oct 2017 12:27:19 +0100
From: Will Deacon <will.deacon@....com>
To: Boqun Feng <boqun.feng@...il.com>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
stern@...land.harvard.edu, parri.andrea@...il.com,
peterz@...radead.org, npiggin@...il.com, dhowells@...hat.com,
j.alglave@....ac.uk, luc.maranget@...ia.fr,
linux-kernel@...r.kernel.org
Subject: Re: Linux-kernel examples for LKMM recipes
On Thu, Oct 12, 2017 at 09:23:59AM +0800, Boqun Feng wrote:
> On Wed, Oct 11, 2017 at 10:32:30PM +0000, Paul E. McKenney wrote:
> > I am not aware of any three-CPU release-acquire chains in the
> > Linux kernel. There are three-CPU lock-based chains in RCU,
> > but these are not at all simple, either.
> >
>
> The "Program-Order guarantees" case in scheduler? See the comments
> written by Peter above try_to_wake_up():
>
> * The basic program-order guarantee on SMP systems is that when a task [t]
> * migrates, all its activity on its old CPU [c0] happens-before any subsequent
> * execution on its new CPU [c1].
> ...
> * For blocking we (obviously) need to provide the same guarantee as for
> * migration. However the means are completely different as there is no lock
> * chain to provide order. Instead we do:
> *
> * 1) smp_store_release(X->on_cpu, 0)
> * 2) smp_cond_load_acquire(!X->on_cpu)
> *
> * Example:
> *
> * CPU0 (schedule) CPU1 (try_to_wake_up) CPU2 (schedule)
> *
> * LOCK rq(0)->lock LOCK X->pi_lock
> * dequeue X
> * sched-out X
> * smp_store_release(X->on_cpu, 0);
> *
> * smp_cond_load_acquire(&X->on_cpu, !VAL);
> * X->state = WAKING
> * set_task_cpu(X,2)
> *
> * LOCK rq(2)->lock
> * enqueue X
> * X->state = RUNNING
> * UNLOCK rq(2)->lock
> *
> * LOCK rq(2)->lock // orders against CPU1
> * sched-out Z
> * sched-in X
> * UNLOCK rq(2)->lock
> *
> * UNLOCK X->pi_lock
> * UNLOCK rq(0)->lock
>
> This is a chain mixed with lock and acquire-release(maybe even better?).
>
>
> And another example would be osq_{lock,unlock}() on multiple(more than
> three) CPUs.
I think the qrwlock also has something similar with the writer fairness
issue fixed:
CPU0: (writer doing an unlock)
smp_store_release(&lock->wlocked, 0); // Bottom byte of lock->cnts
CPU1: (waiting writer on slowpath)
atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
...
arch_spin_unlock(&lock->wait_lock);
CPU2: (reader on slowpath)
arch_spin_lock(&lock->wait_lock);
and there's mixed-size accesses here too. Fun stuff!
Will
Powered by blists - more mailing lists