linux-kernel - Re: [PATCH] futex: replace bare barrier() with a READ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHz2CGXV+vhLYqAScaspqQbXQ6DJsw9MCE+Vkbt0ch0J=y5M0w@mail.gmail.com>
Date:	Tue, 1 Mar 2016 08:44:48 +0800
From:	Jianyu Zhan <nasa4836@...il.com>
To:	Darren Hart <dvhart@...radead.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>, dave@...olabs.net,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>, dvhart@...ux.intel.com,
	bigeasy@...utronix.de, Paul McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH] futex: replace bare barrier() with a READ_ONCE()

On Tue, Mar 1, 2016 at 6:22 AM, Darren Hart <dvhart@...radead.org> wrote:
> You've mention it causes problems a few times, but you do not specify what
> problem it causes or how it manifests.
>
> Is this a theoretical bug, or have you experienced a failure case. If so, how
> did this manifest? Do you have a reproducer we could add to the futex testsuite
> in the kernel selftests?


1.  For the first problem I memtioned, it is a bug that described in
commit e91467ecd1ef.

The scenario is like:

     lock_ptr = q->lock_ptr
     if (lock_ptr != 0)  spin_lock(lock_ptr)

and the compiler generates code that is equivalent to :

     if (q->lock_ptr != 0) spin_lock(q->lock_ptr)

The problem is q->lock_ptr can change between  the test of nullness of
q->lock_ptr and the lock on q->lock_ptr
So a barrier() is inserted into the load of q->lock_ptr and the test
of nullness,  to avoid the pointer aliasing.

Apparently,   a READ_ONCE() fits the goal here.


2.  For the second problem I memtioned,  yes, it is theoretical,  and
it is also due to  q->lock_ptr can change between
the test of nullness of q->lock_ptr and the lock on q->lock_ptr.

the code is

retry:
       lock_ptr = q->lock_ptr;
       if (lock_ptr != 0)  {
                  spin_lock(lock_ptr)
                  if (unlikely(lock_ptr != q->lock_ptr)) {
                        spin_unlock(lock_ptr);
                         goto retry;
                  }
                   ...
       }

which is effectively the same as :

 while (lock_ptr = q->lock_ptr) {
                  spin_lock(lock_ptr)
                  if (unlikely(lock_ptr != q->lock_ptr)) {
                        spin_unlock(lock_ptr);
                         goto retry;
                  }
                   ...
}

This might cause the compiler load the q->lock_ptr once and use it
many times,  quoted from
memory-barriers.txt:


 (*) The compiler is within its rights to reload a variable, for example,
    in cases where high register pressure prevents the compiler from
    keeping all data of interest in registers.  The compiler might
    therefore optimize the variable 'tmp' out of our previous example:

       while (tmp = a)
               do_something_with(tmp);

    This could result in the following code, which is perfectly safe in
    single-threaded code, but can be fatal in concurrent code:

       while (a)
               do_something_with(a);

    For example, the optimized version of this code could result in
    passing a zero to do_something_with() in the case where the variable
    a was modified by some other CPU between the "while" statement and
    the call to do_something_with().

    Again, use READ_ONCE() to prevent the compiler from doing this:

       while (tmp = READ_ONCE(a))
               do_something_with(tmp);




So,  due to these two reason,  I propose this patch.


Thanks,
Jianyu Zhan