linux-kernel - Re: [PATCH] futex: replace bare barrier() with a READ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160302013759.GA362@dvhart-mobl5.amr.corp.intel.com>
Date:	Tue, 1 Mar 2016 17:37:59 -0800
From:	Darren Hart <dvhart@...radead.org>
To:	Jianyu Zhan <nasa4836@...il.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>, dave@...olabs.net,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>, dvhart@...ux.intel.com,
	bigeasy@...utronix.de, Paul McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH] futex: replace bare barrier() with a READ_ONCE()

On Tue, Mar 01, 2016 at 08:44:48AM +0800, Jianyu Zhan wrote:
> On Tue, Mar 1, 2016 at 6:22 AM, Darren Hart <dvhart@...radead.org> wrote:
> > You've mention it causes problems a few times, but you do not specify what
> > problem it causes or how it manifests.
> >
> > Is this a theoretical bug, or have you experienced a failure case. If so, how
> > did this manifest? Do you have a reproducer we could add to the futex testsuite
> > in the kernel selftests?
> 
> 
> 1.  For the first problem I memtioned, it is a bug that described in
> commit e91467ecd1ef.
> 
> The scenario is like:
> 
>      lock_ptr = q->lock_ptr
>      if (lock_ptr != 0)  spin_lock(lock_ptr)
> 
> and the compiler generates code that is equivalent to :
> 
>      if (q->lock_ptr != 0) spin_lock(q->lock_ptr)
> 
> The problem is q->lock_ptr can change between  the test of nullness of
> q->lock_ptr and the lock on q->lock_ptr
> So a barrier() is inserted into the load of q->lock_ptr and the test
> of nullness,  to avoid the pointer aliasing.
> 
> Apparently,   a READ_ONCE() fits the goal here.


read_once will use a *volatile assignment instead of calling barrier()
directly for a word size argument.

With weak statements like "apparently" (above) and "could be" (from the original
post: This patch replaces this bare barrier() with a READ_ONCE(), a weaker form
of barrier(), which could be more informative.)" I do not see a compelling
reason to change what is notoriously fragile code with respect to barriers.

As for #2...

> 2.  For the second problem I memtioned,  yes, it is theoretical,  and
> it is also due to  q->lock_ptr can change between
> the test of nullness of q->lock_ptr and the lock on q->lock_ptr.
> 
> the code is
> 
> retry:
>        lock_ptr = q->lock_ptr;
>        if (lock_ptr != 0)  {
>                   spin_lock(lock_ptr)
>                   if (unlikely(lock_ptr != q->lock_ptr)) {
>                         spin_unlock(lock_ptr);
>                          goto retry;
>                   }
>                    ...
>        }
> 
> which is effectively the same as :
> 
>  while (lock_ptr = q->lock_ptr) {
>                   spin_lock(lock_ptr)
>                   if (unlikely(lock_ptr != q->lock_ptr)) {
>                         spin_unlock(lock_ptr);
>                          goto retry;
>                   }
>                    ...
> }
> 
> This might cause the compiler load the q->lock_ptr once and use it
> many times,  quoted from
> memory-barriers.txt:
> 
> 
>  (*) The compiler is within its rights to reload a variable, for example,
>     in cases where high register pressure prevents the compiler from
>     keeping all data of interest in registers.  The compiler might
>     therefore optimize the variable 'tmp' out of our previous example:
> 
>        while (tmp = a)
>                do_something_with(tmp);
> 
>     This could result in the following code, which is perfectly safe in
>     single-threaded code, but can be fatal in concurrent code:
> 
>        while (a)
>                do_something_with(a);
> 
>     For example, the optimized version of this code could result in
>     passing a zero to do_something_with() in the case where the variable
>     a was modified by some other CPU between the "while" statement and
>     the call to do_something_with().
> 
>     Again, use READ_ONCE() to prevent the compiler from doing this:
> 
>        while (tmp = READ_ONCE(a))
>                do_something_with(tmp);
> 

OK, so this is really the meat of the argument for the patch. You are looking to
add a compiler barrier instead of a CPU memory barrier. This should be what your
commit message is focused on and it should provide compelling evidence to
justify risking the change.

Compelling evidence for a compiler barrier would be a disassembly of the code
block before and after, demonstrating the compiler generating broken code and the
compiler generating correct code.

In addition to this, however, I would want to see a strong convincing argument
that the READ_ONCE volatile cast is sufficient to cover the original case which
motivated the addition of the barrier() (not "apparently" and "could be").

Thanks,

-- 
Darren Hart
Intel Open Source Technology Center