lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 1 Feb 2010 12:45:00 -0500
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	akpm@...ux-foundation.org, Ingo Molnar <mingo@...e.hu>,
	linux-kernel@...r.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Nicholas Miell <nmiell@...cast.net>, laijs@...fujitsu.com,
	dipankar@...ibm.com, josh@...htriplett.org, dvhltc@...ibm.com,
	niv@...ibm.com, tglx@...utronix.de, peterz@...radead.org,
	Valdis.Kletnieks@...edu, dhowells@...hat.com
Subject: Re: [patch 2/3] scheduler: add full memory barriers upon task
	switch at runqueue lock/unlock

* Linus Torvalds (torvalds@...ux-foundation.org) wrote:
> 
> 
> On Mon, 1 Feb 2010, Mathieu Desnoyers wrote:
> > 
> > What we have to be careful about here is that it's not enough to just
> > rely on switch_mm() containing a memory barrier. What we really need to
> > enforce is that switch_mm() issues memory barriers both _before_ and
> > _after_ mm_cpumask modification. The "after" part is usually dealt with
> > by the TLB context switch, but the "before" part usually isn't.
> 
> Ok, whatever. I vote for not doing anything at all, because this just all 
> sounds like some totally crazy crap. You haven't explained the actual 
> races, you haven't explained anything at all, you're apparently just 
> randomly sprinkling smp_mb's around until you think it's all fine.
> 
> Show the actual memory ordering constraints as it is related to the kernel 
> data structures. I'm totally uninterested in general handwaving and "we 
> must have smp_mb's here and here" without very explicit explanations of 
> exactly WHAT the memory orderings are.

It's probably worthwhile to summarize here the mb-related discussions we
had in the previous RFC rounds.

Here is the detailed execution scenario showing the race. The race with
unlock showing that we need to order user-space loads before mm_cpumask
store. This is showing an execution sequence involving the userspace RCU
library and the Linux kernel with sys_membarrier(). As we see below, the
lack of ordering between "cpumask_clear" and user-space execution
creates a race with the scheduler where sys_membarrier() incorrectly
considers CPU 1 as not needing a membarrier_ipi.


        CPU 0                   CPU 1
        --------------          --------------
        <user space>            <user space> (already in read-side C.S.)
                                obj = rcu_dereference(list->next);
                                  -> load list->next
                                copy = obj->foo;
                                rcu_read_unlock();
                                  -> store to urcu_reader.ctr
                                <urcu_reader.ctr store is globally visible>
        list_del(obj);
                                <preempt>
                                <kernel space>

                                schedule();
                                cpumask_clear(mm_cpumask, cpu);

        sys_membarrier();
          <kernel space>
          smp_mb()
          iterates on mm_cpumask
          smp_mb()
        <user space>
        set global g.p. (urcu_gp_ctr) phase to 1
        wait for all urcu_reader.ctr in phase 0
        set global g.p. (urcu_gp_ctr) phase to 0
        wait for all urcu_reader.ctr in phase 1
        sys_membarrier();
          <kernel space>
          smp_mb()
          iterates on mm_cpumask
          smp_mb()
        <user space>
        free(obj);
                                <list->next load hits memory>
                                <obj->foo load hits memory> <- corruption


There is a similar race scenario (symmetric to the one above) between
cpumask_set and a following user-space rcu_read_lock(), which I could
provide upon request.

This results in the following comments around smp_mb() in
sys_membarrier:

First sys_membarrier smp_mb():

        /*
         * Memory barrier on the caller thread before sending first
         * IPI. Orders all memory accesses prior to sys_membarrier() before
         * mm_cpumask read and membarrier_ipi executions.
         * Matches memory barriers before and after mm_cpumask updates.
         */

Second sys_membarrier smp_mb():

        /*
         * Memory barrier on the caller thread after we finished
         * waiting for the last IPI. Orders mm_cpumask read and membarrier_ipi
         * executions before memory accesses following sys_membarrier()
         * execution.
         * Matches memory barriers before and after mm_cpumask updates.
         */

Thanks,

Mathieu


> 
> 		Linus

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ