lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170728085532.ylhuz2irwmgpmejv@hirez.programming.kicks-ass.net>
Date:   Fri, 28 Jul 2017 10:55:32 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org, Boqun Feng <boqun.feng@...il.com>,
        Andrew Hunter <ahh@...gle.com>,
        Maged Michael <maged.michael@...il.com>, gromer@...gle.com,
        Avi Kivity <avi@...lladb.com>
Subject: Re: [RFC PATCH v2] membarrier: expedited private command

On Thu, Jul 27, 2017 at 05:13:14PM -0400, Mathieu Desnoyers wrote:
> +static void membarrier_expedited_mb_after_set_current(struct mm_struct *mm,
> +		struct mm_struct *oldmm)

That is a bit of a mouth-full...

> +{
> +	if (!IS_ENABLED(CONFIG_MEMBARRIER))
> +		return;
> +	/*
> +	 * __schedule()
> +	 *   finish_task_switch()
> +	 *    if (mm)
> +	 *      mmdrop(mm)
> +	 *        atomic_dec_and_test()
         *
> +	 * takes care of issuing a memory barrier when oldmm is
> +	 * non-NULL. We also don't need the barrier when switching to a
> +	 * kernel thread, nor when we switch between threads belonging
> +	 * to the same process.
> +	 */
> +	if (likely(oldmm || !mm || mm == oldmm))
> +		return;
> +	/*
> +	 * When switching between processes, membarrier expedited
> +	 * private requires a memory barrier after we set the current
> +	 * task.
> +	 */
> +	smp_mb();
> +}

And because of what it complements, I would have expected the callsite:

> @@ -2737,6 +2763,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
>  
>  	mm = next->mm;
>  	oldmm = prev->active_mm;
> +	membarrier_expedited_mb_after_set_current(mm, oldmm);
>  	/*
>  	 * For paravirt, this is coupled with an exit in switch_to to
>  	 * combine the page table reload and the switch backend into

to be in finish_task_switch(), something like:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e9785f7aed75..33f34a201255 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2641,8 +2641,18 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	finish_arch_post_lock_switch();
 
 	fire_sched_in_preempt_notifiers(current);
+
+	/*
+	 * For CONFIG_MEMBARRIER we need a full memory barrier after the
+	 * rq->curr assignment. Not all architectures have one in either
+	 * switch_to() or switch_mm() so we use (and complement) the one
+	 * implied by mmdrop()'s atomic_dec_and_test().
+	 */
 	if (mm)
 		mmdrop(mm);
+	else if (IS_ENABLED(CONFIG_MEMBARRIER))
+		smp_mb();
+
 	if (unlikely(prev_state == TASK_DEAD)) {
 		if (prev->sched_class->task_dead)
 			prev->sched_class->task_dead(prev);


I realize this is sub-optimal if we're switching to a kernel thread, so
it might want some work, then again, a whole bunch of architectures
don't in fact need this extra barrier at all.

> +static void membarrier_private_expedited(void)
> +{
> +	int cpu, this_cpu;
> +	bool fallback = false;
> +	cpumask_var_t tmpmask;
> +
> +	if (num_online_cpus() == 1)
> +		return;
> +
> +	/*
> +	 * Matches memory barriers around rq->curr modification in
> +	 * scheduler.
> +	 */
> +	smp_mb();	/* system call entry is not a mb. */
> +

Weren't you going to put in a comment on that GFP_NOWAIT thing?

> +	if (!alloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {

You really want: zalloc_cpumask_var().

> +		/* Fallback for OOM. */
> +		fallback = true;
> +	}
> +
> +	/*
> +	 * Skipping the current CPU is OK even through we can be
> +	 * migrated at any point. The current CPU, at the point where we
> +	 * read raw_smp_processor_id(), is ensured to be in program
> +	 * order with respect to the caller thread. Therefore, we can
> +	 * skip this CPU from the iteration.
> +	 */
> +	this_cpu = raw_smp_processor_id();

So if instead you do the below, that is still true, but you have the
opportunity to skip moar CPUs, then again, if you migrate the wrong way
you'll end up not skipping yourself.. a well.

> +	cpus_read_lock();
> +	for_each_online_cpu(cpu) {
> +		struct task_struct *p;
> +
		if (cpu == raw_smp_processor_id())
			continue;

> +		rcu_read_lock();
> +		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
> +		if (p && p->mm == current->mm) {
> +			if (!fallback)
> +				__cpumask_set_cpu(cpu, tmpmask);
> +			else
> +				smp_call_function_single(cpu, ipi_mb, NULL, 1);
> +		}
> +		rcu_read_unlock();
> +	}
> +	cpus_read_unlock();

This ^, wants to go after that v

> +	if (!fallback) {
> +		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
> +		free_cpumask_var(tmpmask);
> +	}

Because otherwise the bits in your tmpmask might no longer match the
online state.

> +
> +	/*
> +	 * Memory barrier on the caller thread _after_ we finished
> +	 * waiting for the last IPI. Matches memory barriers around
> +	 * rq->curr modification in scheduler.
> +	 */
> +	smp_mb();	/* exit from system call is not a mb */
> +}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ