[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20171011223229.GA31650@linux.vnet.ibm.com>
Date: Wed, 11 Oct 2017 15:32:30 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: stern@...land.harvard.edu, parri.andrea@...il.com,
will.deacon@....com, peterz@...radead.org, boqun.feng@...il.com,
npiggin@...il.com, dhowells@...hat.com, j.alglave@....ac.uk,
luc.maranget@...ia.fr
Cc: linux-kernel@...r.kernel.org
Subject: Linux-kernel examples for LKMM recipes
Hello!
At Linux Plumbers Conference, we got requests for a recipes document,
and a further request to point to actual code in the Linux kernel.
I have pulled together some examples for various litmus-test families,
as shown below. The decoder ring for the abbreviations (ISA2, LB, SB,
MP, ...) is here:
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test6.pdf
This document is also checked into the memory-models git archive:
https://github.com/aparri/memory-model.git
I would be especially interested in simpler examples in general, and
of course any example at all for the cases where I was unable to find
any. Thoughts?
Thanx, Paul
------------------------------------------------------------------------
This document lists the litmus-test patterns that we have been discussing,
along with examples from the Linux kernel. This is intended to feed into
the recipes document. All examples are from v4.13.
0. Single-variable SC.
a. Within a single CPU, the use of the ->dynticks_nmi_nesting
counter by rcu_nmi_enter() and rcu_nmi_exit() qualifies
(see kernel/rcu/tree.c). The counter is accessed by
interrupts and NMIs as well as by process-level code.
This counter can be accessed by other CPUs, but only
for debug output.
b. Between CPUs, I would put forward the ->dflags
updates, but this is anything but simple. But maybe
OK for an illustration?
1. MP (see test6.pdf for nickname translation)
a. smp_store_release() / smp_load_acquire()
init_stack_slab() in lib/stackdepot.c uses release-acquire
to handle initialization of a slab of the stack. Working
out the mutual-exclusion design is left as an exercise for
the reader.
b. rcu_assign_pointer() / rcu_dereference()
expand_to_next_prime() does the rcu_assign_pointer(),
and next_prime_number() does the rcu_dereference().
This mediates access to a bit vector that is expanded
as additional primes are needed. These two functions
are in lib/prime_numbers.c.
c. smp_wmb() / smp_rmb()
xlog_state_switch_iclogs() contains the following:
log->l_curr_block -= log->l_logBBsize;
ASSERT(log->l_curr_block >= 0);
smp_wmb();
log->l_curr_cycle++;
And xlog_valid_lsn() contains the following:
cur_cycle = ACCESS_ONCE(log->l_curr_cycle);
smp_rmb();
cur_block = ACCESS_ONCE(log->l_curr_block);
d. Replacing either of the above with smp_mb()
Holding off on this one for the moment...
2. Release-acquire chains, AKA ISA2, Z6.2, LB, and 3.LB
Lots of variety here, can in some cases substitute:
a. READ_ONCE() for smp_load_acquire()
b. WRITE_ONCE() for smp_store_release()
c. Dependencies for both smp_load_acquire() and
smp_store_release().
d. smp_wmb() for smp_store_release() in first thread
of ISA2 and Z6.2.
e. smp_rmb() for smp_load_acquire() in last thread of ISA2.
The canonical illustration of LB involves the various memory
allocators, where you don't want a load from about-to-be-freed
memory to see a store initializing a later incarnation of that
same memory area. But the per-CPU caches make this a very
long and complicated example.
I am not aware of any three-CPU release-acquire chains in the
Linux kernel. There are three-CPU lock-based chains in RCU,
but these are not at all simple, either.
Thoughts?
3. SB
a. smp_mb(), as in lockless wait-wakeup coordination.
And as in sys_membarrier()-scheduler coordination,
for that matter.
Examples seem to be lacking. Most cases use locking.
Here is one rather strange one from RCU:
void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func)
{
unsigned long flags;
bool needwake;
bool havetask = READ_ONCE(rcu_tasks_kthread_ptr);
rhp->next = NULL;
rhp->func = func;
raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
needwake = !rcu_tasks_cbs_head;
*rcu_tasks_cbs_tail = rhp;
rcu_tasks_cbs_tail = &rhp->next;
raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
/* We can't create the thread unless interrupts are enabled. */
if ((needwake && havetask) ||
(!havetask && !irqs_disabled_flags(flags))) {
rcu_spawn_tasks_kthread();
wake_up(&rcu_tasks_cbs_wq);
}
}
And for the wait side, using synchronize_sched() to supply
the barrier for both ends, with the preemption disabling
due to raw_spin_lock_irqsave() serving as the read-side
critical section:
if (!list) {
wait_event_interruptible(rcu_tasks_cbs_wq,
rcu_tasks_cbs_head);
if (!rcu_tasks_cbs_head) {
WARN_ON(signal_pending(current));
schedule_timeout_interruptible(HZ/10);
}
continue;
}
synchronize_sched();
-----------------
Here is another one that uses atomic_cmpxchg() as a
full memory barrier:
if (!wait_event_timeout(*wait, !atomic_read(stopping),
msecs_to_jiffies(1000))) {
atomic_set(stopping, 0);
smp_mb();
return -ETIMEDOUT;
}
int omap3isp_module_sync_is_stopping(wait_queue_head_t *wait,
atomic_t *stopping)
{
if (atomic_cmpxchg(stopping, 1, 0)) {
wake_up(wait);
return 1;
}
return 0;
}
Powered by blists - more mailing lists