lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 19 Jan 2010 12:30:38 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	linux-kernel@...r.kernel.org,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Oleg Nesterov <oleg@...hat.com>, Ingo Molnar <mingo@...e.hu>,
	akpm@...ux-foundation.org, josh@...htriplett.org,
	tglx@...utronix.de, Valdis.Kletnieks@...edu, dhowells@...hat.com,
	laijs@...fujitsu.com, dipankar@...ibm.com
Subject: Re: [RFC PATCH] introduce sys_membarrier(): process-wide memory
 barrier (v5)

On Tue, 2010-01-19 at 17:47 +0100, Peter Zijlstra wrote:
> On Thu, 2010-01-14 at 11:26 -0500, Mathieu Desnoyers wrote:

> > It's this scenario that is causing problem. Let's consider this
> > execution:
> > 
> >        CPU 0 (membarrier)                  CPU 1 (another mm -> our mm)
> >        <kernel-space>                      <kernel-space>
> >                                            switch_mm()
> >                                              smp_mb()
> >                                              clear_mm_cpumask()
> >                                              set_mm_cpumask()
> >                                              smp_mb() (by load_cr3() on x86)
> >                                            switch_to()
> >        mm_cpumask includes CPU 1
> >        rcu_read_lock()
> >        if (CPU 1 mm != our mm)
> >          skip CPU 1.
> >        rcu_read_unlock()
> >                                              current = next (1)
> 
> OK, so on x86 current uses esp and will be flipped somewhere in the
> switch_to() magic, cpu_curr(cpu) as used by CPU 0 uses rq->curr, which
> will be set before context_switch() and that always implies a mb() for
> non matching ->mm's [*]


This explanation by Mathieu still does not show the issue. He finally
explained it correctly here:

 http://lkml.org/lkml/2010/1/14/319

The issue is before switch_to. If the read side reads in a pointer but
gets preempted and goes to the kernel, and the kernel updates next
(before the switch_to), and the write side does sys_membarrier(), the:

if (cpu_curr(1)->mm != our mm)

will fail, so no memory barrier will happen yet.

Then the userspace side of the writer will free or change the data, but
the reader side read is still stuck in the bus (still before the
switch_to). Then the reader can get a bad pointer.

Here's a simple explanation:

	CPU 0				CPU 1
      ----------		     -----------
	<userspace>			<userspace>
	obj = get_obj();
					rcu_read_lock();
					obj = get_obj(); <load of obj>
					<obj still being retrieved from memory>
					<interrupt, schedule out>
					<kernelspace>
					rq->curr = next;
					<bus stall>
					<obj still in memory>

	sys_membarrier();
	<kernel space>
	if (curr_cpu(1)->mm != mm) <-- false
	<userspace>

	modify object

					switch_to()
					<obj finally is loaded>
					<schedule kernel thread>
					<schedule task back>
					<userspace>
					refer(obj) <-- corruption!

	
					

> 
> >                                            <switch back to user-space>
> >                                            read-lock()
> >                                              read gp, store local gp
> >                                              barrier()
> >                                              access critical section (2)
> > 
> > So if we don't have any memory barrier between (1) and (2), the memory
> > operations can be reordered in such a way that CPU 0 will not send IPI
> > to a CPU that would need to have it's barrier() promoted into a
> > smp_mb().
> 
> OK, so I'm utterly failing to make sense of the above, do you need more
> than the 2 cpus discussed to make it go boom?

Just 2 cpu's as explained above.

> 
> > Replacing these kernel rcu_read_lock/unlock() by rq locks ensures that
> > when the scheduler runs concurrently on another CPU, _all_ the scheduling
> > code is executed atomically wrt the spin lock taken on cpu 0.
> 
> Sure, but taking the rq->lock is fairly heavy handed.

Yes, but for now it seems to be the only safe way.

> 
> > When x86 uses iret to return to user-space, then we have a serializing
> > instruction. But if it uses sysexit, or if we are on a different
> > architecture, are we sure that a memory barrier is issued before
> > returning to user-space ?
> 
> [*] and possibly also for matching ->mm's, because:
> 
> OK, so I had a quick look at the switch_to() magic, and from what I can
> make of it it implies an mb, if only because poking at the segment
> registers implies LOCK semantics.

The problem is that this issue can be caused before we get to
switch_to().

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ