linux-kernel - Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix anon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <8EA729DD-F1CE-4C6F-A074-147A6A1BBCE0@gmail.com>
Date:   Thu, 27 Jul 2023 12:05:53 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Will Deacon <will@...nel.org>
Cc:     Jann Horn <jannh@...gle.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...uxfoundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Alan Stern <stern@...land.harvard.edu>,
        Andrea Parri <parri.andrea@...il.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Nicholas Piggin <npiggin@...il.com>,
        David Howells <dhowells@...hat.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Akira Yokosawa <akiyks@...il.com>,
        Daniel Lustig <dlustig@...dia.com>,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix
 anon_vma memory ordering



> On Jul 27, 2023, at 7:57 AM, Will Deacon <will@...nel.org> wrote:
> 
> On Thu, Jul 27, 2023 at 04:39:34PM +0200, Jann Horn wrote:
>> On Thu, Jul 27, 2023 at 1:19 AM Paul E. McKenney <paulmck@...nel.org> wrote:
>>> 
>>> On Wed, Jul 26, 2023 at 11:41:01PM +0200, Jann Horn wrote:
>>>> Hi!
>>>> 
>>>> Patch 1 here is a straightforward fix for a race in per-VMA locking code
>>>> that can lead to use-after-free; I hope we can get this one into
>>>> mainline and stable quickly.
>>>> 
>>>> Patch 2 is a fix for what I believe is a longstanding memory ordering
>>>> issue in how vma->anon_vma is used across the MM subsystem; I expect
>>>> that this one will have to go through a few iterations of review and
>>>> potentially rewrites, because memory ordering is tricky.
>>>> (If someone else wants to take over patch 2, I would be very happy.)
>>>> 
>>>> These patches don't really belong together all that much, I'm just
>>>> sending them as a series because they'd otherwise conflict.
>>>> 
>>>> I am CCing:
>>>> 
>>>> - Suren because patch 1 touches his code
>>>> - Matthew Wilcox because he is also currently working on per-VMA
>>>>   locking stuff
>>>> - all the maintainers/reviewers for the Kernel Memory Consistency Model
>>>>   so they can help figure out the READ_ONCE() vs smp_load_acquire()
>>>>   thing
>>> 
>>> READ_ONCE() has weaker ordering properties than smp_load_acquire().
>>> 
>>> For example, given a pointer gp:
>>> 
>>>        p = whichever(gp);
>>>        a = 1;
>>>        r1 = p->b;
>>>        if ((uintptr_t)p & 0x1)
>>>                WRITE_ONCE(b, 1);
>>>        WRITE_ONCE(c, 1);
>>> 
>>> Leaving aside the "&" needed by smp_load_acquire(), if "whichever" is
>>> "READ_ONCE", then the load from p->b and the WRITE_ONCE() to "b" are
>>> ordered after the load from gp (the former due to an address dependency
>>> and the latter due to a (fragile) control dependency).  The compiler
>>> is within its rights to reorder the store to "a" to precede the load
>>> from gp.  The compiler is forbidden from reordering the store to "c"
>>> wtih the load from gp (because both are volatile accesses), but the CPU
>>> is completely within its rights to do this reordering.
>>> 
>>> But if "whichever" is "smp_load_acquire()", all four of the subsequent
>>> memory accesses are ordered after the load from gp.
>>> 
>>> Similarly, for WRITE_ONCE() and smp_store_release():
>>> 
>>>        p = READ_ONCE(gp);
>>>        r1 = READ_ONCE(gi);
>>>        r2 = READ_ONCE(gj);
>>>        a = 1;
>>>        WRITE_ONCE(b, 1);
>>>        if (r1 & 0x1)
>>>                whichever(p->q, r2);
>>> 
>>> Again leaving aside the "&" needed by smp_store_release(), if "whichever"
>>> is WRITE_ONCE(), then the load from gp, the load from gi, and the load
>>> from gj are all ordered before the store to p->q (by address dependency,
>>> control dependency, and data dependency, respectively).  The store to "a"
>>> can be reordered with the store to p->q by the compiler.  The store to
>>> "b" cannot be reordered with the store to p->q by the compiler (again,
>>> both are volatile), but the CPU is free to reorder them, especially when
>>> whichever() is implemented as a conditional store.
>>> 
>>> But if "whichever" is "smp_store_release()", all five of the earlier
>>> memory accesses are ordered before the store to p->q.
>>> 
>>> Does that help, or am I missing the point of your question?
>> 
>> My main question is how permissible/ugly you think the following use
>> of READ_ONCE() would be, and whether you think it ought to be an
>> smp_load_acquire() instead.
>> 
>> Assume that we are holding some kind of lock that ensures that the
>> only possible concurrent update to "vma->anon_vma" is that it changes
>> from a NULL pointer to a non-NULL pointer (using smp_store_release()).
>> 
>> 
>> if (READ_ONCE(vma->anon_vma) != NULL) {
>>  // we now know that vma->anon_vma cannot change anymore
>> 
>>  // access the same memory location again with a plain load
>>  struct anon_vma *a = vma->anon_vma;
>> 
>>  // this needs to be address-dependency-ordered against one of
>>  // the loads from vma->anon_vma
>>  struct anon_vma *root = a->root;
>> }
>> 
>> 
>> Is this fine? If it is not fine just because the compiler might
>> reorder the plain load of vma->anon_vma before the READ_ONCE() load,
>> would it be fine after adding a barrier() directly after the
>> READ_ONCE()?
> 
> I'm _very_ wary of mixing READ_ONCE() and plain loads to the same variable,
> as I've run into cases where you have sequences such as:
> 
> // Assume *ptr is initially 0 and somebody else writes it to 1
> // concurrently
> 
> foo = *ptr;
> bar = READ_ONCE(*ptr);
> baz = *ptr;
> 
> and you can get foo == baz == 0 but bar == 1 because the compiler only
> ends up reading from memory twice.
> 
> That was the root cause behind f069faba6887 ("arm64: mm: Use READ_ONCE
> when dereferencing pointer to pte table"), which was very unpleasant to
> debug.

Interesting. I wonder if you considered adding to READ_ONCE() something
like:

	asm volatile("" : "+g" (x) );

So later loads (such as baz = *ptr) would reload the updated value.