linux-kernel - Re: [PATCH tip/core/rcu 4/5] sys_membarrier: Add expedited option

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170725202451.GC28975@worktop>
Date:   Tue, 25 Jul 2017 22:24:51 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     linux-kernel@...r.kernel.org, mingo@...nel.org,
        jiangshanlai@...il.com, dipankar@...ibm.com,
        akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
        josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
        dhowells@...hat.com, edumazet@...gle.com, fweisbec@...il.com,
        oleg@...hat.com
Subject: Re: [PATCH tip/core/rcu 4/5] sys_membarrier: Add expedited option

On Tue, Jul 25, 2017 at 12:36:12PM -0700, Paul E. McKenney wrote:

> There are a lot of variations, to be sure.  For whatever it is worth,
> the original patch that started this uses mprotect():
> 
> https://github.com/msullivan/userspace-rcu/commit/04656b468d418efbc5d934ab07954eb8395a7ab0

FWIW that will not work on s390 (and maybe others), they don't in fact
require IPIs for remote TLB invalidation.

> > Which are those? I thought we significantly reduced those with the nohz
> > full work. Most IPI uses now first check if a CPU actually needs the IPI
> > before sending it IIRC.
> 
> If the task being awakened is higher priority than the task currently
> running on a given CPU, that CPU still gets an IPI, right?  Or am I
> completely confused?

I was thinking of things like on_each_cpu_cond().

> Unlike userspace preempt disable, in this case we get the abuse anyway
> via existing mechanisms, as in they are already being abused.  If we
> provide a mechanism for this purpose, we at least have the potential
> for handling the abuse, for example:
> 
> o	"Defanging" sys_membarrier() on systems that are sensitive to
> 	latency.  For example, this patch can be defanged by booting
> 	with the rcupdate.rcu_normal=1 kernel boot parameter, which
> 	causes requests for expedited grace periods to instead use
> 	normal grace periods.
> 
> o	Detecting and responding to abuse.  For example, perhaps if there
> 	are more than (say) 50 expedited sys_membarrier()s within a given
> 	jiffy, the excess sys_membarrier()s are non-expedited.
> 
> o	Batching optimizations allow large number of concurrent requests
> 	to be handled with fewer grace periods -- and both normal and
> 	expedited grace periods already do exactly this.
> 
> This horse is already out, so trying to shut the gate won't be effective.

Well, I'm not sure there is an easy means of doing machine wide IPIs for
!root out there. This would be a first.

Something along the lines of:

void dummy(void *arg)
{
	/* IPIs are assumed to be serializing */
}

void ipi_mm(struct mm_struct *mm)
{
	cpumask_var_t cpus;
	int cpu;

	zalloc_cpumask_var(&cpus, GFP_KERNEL);

	for_each_cpu(cpu, mm_cpumask(mm)) {
		struct task_struct *p;

		/*
		 * If the current task of @cpu isn't of this @mm, then
		 * it needs a context switch to become one, which will
		 * provide the ordering we require.
		 */
		rcu_read_lock();
		p = task_rcu_dereference(&cpu_curr(cpu));
		if (p && p->mm == mm)
			__cpumask_set_cpu(cpu, cpus);
		rcu_read_unlock();
	}

	on_each_cpu_mask(cpus, dummy, NULL, 1);
}

Would appear to be minimally invasive and only shoot at CPUs we're
currently running our process on, which greatly reduces the impact.