lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1265056389.29013.126.camel@gandalf.stny.rr.com>
Date:	Mon, 01 Feb 2010 15:33:09 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	akpm@...ux-foundation.org, Ingo Molnar <mingo@...e.hu>,
	linux-kernel@...r.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Nicholas Miell <nmiell@...cast.net>, laijs@...fujitsu.com,
	dipankar@...ibm.com, josh@...htriplett.org, dvhltc@...ibm.com,
	niv@...ibm.com, tglx@...utronix.de, peterz@...radead.org,
	Valdis.Kletnieks@...edu, dhowells@...hat.com
Subject: Re: [patch 2/3] scheduler: add full memory barriers upon task
 switch at runqueue lock/unlock

On Mon, 2010-02-01 at 10:36 -0800, Linus Torvalds wrote:
> 

> I'm not interested in the user-space code. Don't even quote it. It's 
> irrelevant apart from the actual semantics you want to guarantee for
> the 
> new membarrier() system call. So don't quote the code, just explain
> what 
> the actual barriers are.


OK, but first we must establish that the sys_membarrier() system call
guarantees that all running threads of this process have an mb()
performed on them before this syscall returns.

The simplest implementation would be to just do an IPI on all CPUs and
have that IPI perform the mb(). But that would interfere with other
tasks, so we want to limit it to only sending the mb()'s to the threads
of the process that are currently running. We use the mm_cpumask to find
out what threads are associated with this task, and only send the IPI to
the CPUs running threads of the current process.

With the kernel point of view, the goal is to make sure a mb() happens
on all running threads of the calling process.

The code does the following:

	for_each_cpu(cpu, mm_cpumask(current->mm)) {
		if (current->mm == cpu_curr(cpu)->mm)
			send_ipi();
	}

But a race exists between the reading of the mm_cpumask and sending the
IPI. There is in fact two different problems with this race. One is that
a thread scheduled away, but never issued an mb(), the other is that a
running task just came in and we never saw it.

Here:

	   CPU 0		   CPU 1
	-----------		-----------
				< same thread >
				schedule()
				clear_bit();

	current->mm == cpu_curr(1)->mm <<< failed
	return sys_membarrier();

				context_switch();


The above fails the situation, because we missed our thread before it
actually switched to another task. This fails the guarantee that the
syscall sys_membarrier() implies.


Second scenario, for non-x86 archs that do not imply a mb() on
switch_mm():

	   CPU 0		   CPU 1
	-----------		-----------
				< different thread >
				schedule();
				clear_bit();
				set_bit();
				schedule();
				< same thread >

	sys_membarrier();
	current->mm == cpu_curr(1)->mm <<<<< failed


This scenario happens if the switch_mm() does not imply a mb(). That is,
the syscall sys_membarrier() was called after CPU 1 scheduled a thread
of the same process, but the switch_mm() did not force the mb() causing
CPU 0 to see the old value of the mm_cpumask.



The above does not take any user-space into account. It only tries to
fulfill the kernel's obligation of sys_membarrier to ensure that all
threads of the calling process has an mb() performed on them.

Mathieu, from this point of view, you can explain the necessary mb()s
that are within the kernel proper.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ