netdev - Re: rcu_read_lock lost its compiler barrier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190603195304.GK28207@linux.ibm.com>
Date:   Mon, 3 Jun 2019 12:53:04 -0700
From:   "Paul E. McKenney" <paulmck@...ux.ibm.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Herbert Xu <herbert@...dor.apana.org.au>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Fengguang Wu <fengguang.wu@...el.com>, LKP <lkp@...org>,
        LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>
Subject: Re: rcu_read_lock lost its compiler barrier

On Mon, Jun 03, 2019 at 09:07:29AM -0700, Linus Torvalds wrote:
> On Mon, Jun 3, 2019 at 8:55 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > I don't believe that it would necessarily help to turn a
> > rcu_read_lock() into a compiler barrier, because for the non-preempt
> > case rcu_read_lock() doesn't need to actually _do_ anything, and
> > anything that matters for the RCU read lock will already be a compiler
> > barrier for other reasons (ie a function call that can schedule).
> 
> Actually, thinking a bit more about this, and trying to come up with
> special cases, I'm not at all convinced.
> 
> Even if we don't have preemption enabled, it turns out that we *do*
> have things that can cause scheduling without being compiler barriers.
> 
> In particular, user accesses are not necessarily full compiler
> barriers. One common pattern (x86) is
> 
>         asm volatile("call __get_user_%P4"
> 
> which explicitly has a "asm volaile" so that it doesn't re-order wrt
> other asms (and thus other user accesses), but it does *not* have a
> "memory" clobber, because the user access doesn't actually change
> kernel memory. Not even if it's a "put_user()".
> 
> So we've made those fairly relaxed on purpose. And they might be
> relaxed enough that they'd allow re-ordering wrt something that does a
> rcu read lock, unless the rcu read lock has some compiler barrier in
> it.
> 
> IOW, imagine completely made up code like
> 
>      get_user(val, ptr)
>      rcu_read_lock();
>      WRITE_ONCE(state, 1);
> 
> and unless the rcu lock has a barrier in it, I actually think that
> write to 'state' could migrate to *before* the get_user().
> 
> I'm not convinced we have anything that remotely looks like the above,
> but I'm actually starting to think that yes, all RCU barriers had
> better be compiler barriers.
> 
> Because this is very much an example of something where you don't
> necessarily need a memory barrier, but there's a code generation
> barrier needed because of local ordering requirements. The possible
> faulting behavior of "get_user()" must not migrate into the RCU
> critical region.
> 
> Paul?

I agree that !PREEMPT rcu_read_lock() would not affect compiler code
generation, but given that get_user() is a volatile asm, isn't the
compiler already forbidden from reordering it with the volatile-casted
WRITE_ONCE() access, even if there was nothing at all between them?
Or are asms an exception to the rule that volatile executions cannot
be reordered?

> So I think the rule really should be: every single form of locking
> that has any semantic meaning at all, absolutely needs to be at least
> a compiler barrier.
> 
> (That "any semantic meaning" weaselwording is because I suspect that
> we have locking that truly and intentionally becomes no-ops because
> it's based on things that aren't relevant in some configurations. But
> generally compiler barriers are really pretty damn cheap, even from a
> code generation standpoint, and can help make the resulting code more
> legible, so I think we should not try to aggressively remove them
> without _very_ good reasons)

We can of course put them back in, but this won't help in the typical
rcu_assign_pointer(), rcu_dereference(), and synchronize_rcu() situation
(nor do I see how it helps in Hubert's example).  And in other RCU
use cases, the accesses analogous to the rcu_assign_pointer() and
rcu_dereference() (in Hubert's example, the accesses to variable "a")
really need to be READ_ONCE()/WRITE_ONCE() or stronger, correct?

							Thanx, Paul