linux-kernel - Re: Linux-kernel examples for LKMM recipes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20171017205650.GA3521@linux.vnet.ibm.com>
Date:   Tue, 17 Oct 2017 13:56:50 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Boqun Feng <boqun.feng@...il.com>
Cc:     stern@...land.harvard.edu, parri.andrea@...il.com,
        will.deacon@....com, peterz@...radead.org, npiggin@...il.com,
        dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
        linux-kernel@...r.kernel.org
Subject: Re: Linux-kernel examples for LKMM recipes

On Thu, Oct 12, 2017 at 09:23:59AM +0800, Boqun Feng wrote:
> On Wed, Oct 11, 2017 at 10:32:30PM +0000, Paul E. McKenney wrote:
> > Hello!
> > 
> > At Linux Plumbers Conference, we got requests for a recipes document,
> > and a further request to point to actual code in the Linux kernel.
> > I have pulled together some examples for various litmus-test families,
> > as shown below.  The decoder ring for the abbreviations (ISA2, LB, SB,
> > MP, ...) is here:
> > 
> > 	https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test6.pdf
> > 
> > This document is also checked into the memory-models git archive:
> > 
> > 	https://github.com/aparri/memory-model.git
> > 
> > I would be especially interested in simpler examples in general, and
> > of course any example at all for the cases where I was unable to find
> > any.  Thoughts?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > This document lists the litmus-test patterns that we have been discussing,
> > along with examples from the Linux kernel.  This is intended to feed into
> > the recipes document.  All examples are from v4.13.
> > 
> > 0.	Single-variable SC.
> > 
> > 	a.	Within a single CPU, the use of the ->dynticks_nmi_nesting
> > 		counter by rcu_nmi_enter() and rcu_nmi_exit() qualifies
> > 		(see kernel/rcu/tree.c).  The counter is accessed by
> > 		interrupts and NMIs as well as by process-level code.
> > 		This counter can be accessed by other CPUs, but only
> > 		for debug output.
> > 
> > 	b.	Between CPUs, I would put forward the ->dflags
> > 		updates, but this is anything but simple.  But maybe
> > 		OK for an illustration?
> > 
> > 1.	MP (see test6.pdf for nickname translation)
> > 
> > 	a.	smp_store_release() / smp_load_acquire()
> > 
> > 		init_stack_slab() in lib/stackdepot.c uses release-acquire
> > 		to handle initialization of a slab of the stack.  Working
> > 		out the mutual-exclusion design is left as an exercise for
> > 		the reader.
> > 
> > 	b.	rcu_assign_pointer() / rcu_dereference()
> > 
> > 		expand_to_next_prime() does the rcu_assign_pointer(),
> > 		and next_prime_number() does the rcu_dereference().
> > 		This mediates access to a bit vector that is expanded
> > 		as additional primes are needed.  These two functions
> > 		are in lib/prime_numbers.c.
> > 
> > 	c.	smp_wmb() / smp_rmb()
> > 
> > 		xlog_state_switch_iclogs() contains the following:
> > 
> > 			log->l_curr_block -= log->l_logBBsize;
> > 			ASSERT(log->l_curr_block >= 0);
> > 			smp_wmb();
> > 			log->l_curr_cycle++;
> > 
> > 		And xlog_valid_lsn() contains the following:
> > 
> > 			cur_cycle = ACCESS_ONCE(log->l_curr_cycle);
> > 			smp_rmb();
> > 			cur_block = ACCESS_ONCE(log->l_curr_block);
> > 
> > 	d.	Replacing either of the above with smp_mb()
> > 
> > 		Holding off on this one for the moment...
> > 
> > 2.	Release-acquire chains, AKA ISA2, Z6.2, LB, and 3.LB
> > 
> > 	Lots of variety here, can in some cases substitute:
> > 	
> > 	a.	READ_ONCE() for smp_load_acquire()
> > 	b.	WRITE_ONCE() for smp_store_release()
> > 	c.	Dependencies for both smp_load_acquire() and
> > 		smp_store_release().
> > 	d.	smp_wmb() for smp_store_release() in first thread
> > 		of ISA2 and Z6.2.
> > 	e.	smp_rmb() for smp_load_acquire() in last thread of ISA2.
> > 
> > 	The canonical illustration of LB involves the various memory
> > 	allocators, where you don't want a load from about-to-be-freed
> > 	memory to see a store initializing a later incarnation of that
> > 	same memory area.  But the per-CPU caches make this a very
> > 	long and complicated example.
> > 
> > 	I am not aware of any three-CPU release-acquire chains in the
> > 	Linux kernel.  There are three-CPU lock-based chains in RCU,
> > 	but these are not at all simple, either.
> > 
> 
> The "Program-Order guarantees" case in scheduler? See the comments
> written by Peter above try_to_wake_up():
> 
>  * The basic program-order guarantee on SMP systems is that when a task [t]
>  * migrates, all its activity on its old CPU [c0] happens-before any subsequent
>  * execution on its new CPU [c1].
> ...
>  * For blocking we (obviously) need to provide the same guarantee as for
>  * migration. However the means are completely different as there is no lock
>  * chain to provide order. Instead we do:
>  *
>  *   1) smp_store_release(X->on_cpu, 0)
>  *   2) smp_cond_load_acquire(!X->on_cpu)
>  *
>  * Example:
>  *
>  *   CPU0 (schedule)  CPU1 (try_to_wake_up) CPU2 (schedule)
>  *
>  *   LOCK rq(0)->lock LOCK X->pi_lock
>  *   dequeue X
>  *   sched-out X
>  *   smp_store_release(X->on_cpu, 0);
>  *
>  *                    smp_cond_load_acquire(&X->on_cpu, !VAL);
>  *                    X->state = WAKING
>  *                    set_task_cpu(X,2)
>  *
>  *                    LOCK rq(2)->lock
>  *                    enqueue X
>  *                    X->state = RUNNING
>  *                    UNLOCK rq(2)->lock
>  *
>  *                                          LOCK rq(2)->lock // orders against CPU1
>  *                                          sched-out Z
>  *                                          sched-in X
>  *                                          UNLOCK rq(2)->lock
>  *
>  *                    UNLOCK X->pi_lock
>  *   UNLOCK rq(0)->lock
> 
> This is a chain mixed with lock and acquire-release(maybe even better?).

I added this one, though it might be outside of the scope of recipes.

						Thanx, Paul

> And another example would be osq_{lock,unlock}() on multiple(more than
> three) CPUs. 
> 
> Regards,
> Boqun
> 
> > 	Thoughts?
> > 
> > 3.	SB
> > 
> > 	a.	smp_mb(), as in lockless wait-wakeup coordination.
> > 		And as in sys_membarrier()-scheduler coordination,
> > 		for that matter.
> > 
> > 		Examples seem to be lacking.  Most cases use locking.
> > 		Here is one rather strange one from RCU:
> > 
> > 		void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func)
> > 		{
> > 			unsigned long flags;
> > 			bool needwake;
> > 			bool havetask = READ_ONCE(rcu_tasks_kthread_ptr);
> > 
> > 			rhp->next = NULL;
> > 			rhp->func = func;
> > 			raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
> > 			needwake = !rcu_tasks_cbs_head;
> > 			*rcu_tasks_cbs_tail = rhp;
> > 			rcu_tasks_cbs_tail = &rhp->next;
> > 			raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
> > 			/* We can't create the thread unless interrupts are enabled. */
> > 			if ((needwake && havetask) ||
> > 			    (!havetask && !irqs_disabled_flags(flags))) {
> > 				rcu_spawn_tasks_kthread();
> > 				wake_up(&rcu_tasks_cbs_wq);
> > 			}
> > 		}
> > 
> > 		And for the wait side, using synchronize_sched() to supply
> > 		the barrier for both ends, with the preemption disabling
> > 		due to raw_spin_lock_irqsave() serving as the read-side
> > 		critical section:
> > 
> > 		if (!list) {
> > 			wait_event_interruptible(rcu_tasks_cbs_wq,
> > 						 rcu_tasks_cbs_head);
> > 			if (!rcu_tasks_cbs_head) {
> > 				WARN_ON(signal_pending(current));
> > 				schedule_timeout_interruptible(HZ/10);
> > 			}
> > 			continue;
> > 		}
> > 		synchronize_sched();
> > 
> > 		-----------------
> > 
> > 		Here is another one that uses atomic_cmpxchg() as a
> > 		full memory barrier:
> > 
> > 		if (!wait_event_timeout(*wait, !atomic_read(stopping),
> > 					msecs_to_jiffies(1000))) {
> > 			atomic_set(stopping, 0);
> > 			smp_mb();
> > 			return -ETIMEDOUT;
> > 		}
> > 
> > 		int omap3isp_module_sync_is_stopping(wait_queue_head_t *wait,
> > 						     atomic_t *stopping)
> > 		{
> > 			if (atomic_cmpxchg(stopping, 1, 0)) {
> > 				wake_up(wait);
> > 				return 1;
> > 			}
> > 
> > 			return 0;
> > 		}
> >