lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <A923D77C-8C45-41B0-A1B2-55F68168D058@gmail.com>
Date:   Mon, 27 Sep 2021 03:11:20 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Peter Xu <peterx@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Minchan Kim <minchan@...nel.org>,
        Colin Cross <ccross@...gle.com>,
        Suren Baghdasarya <surenb@...gle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH 1/8] mm/madvise: propagate vma->vm_end changes


> On Sep 27, 2021, at 2:08 AM, Kirill A. Shutemov <kirill@...temov.name> wrote:
> 
> On Sun, Sep 26, 2021 at 09:12:52AM -0700, Nadav Amit wrote:
>> From: Nadav Amit <namit@...are.com>
>> 
>> The comment in madvise_dontneed_free() says that vma splits that occur
>> while the mmap-lock is dropped, during userfaultfd_remove(), should be
>> handled correctly, but nothing in the code indicates that it is so: prev
>> is invalidated, and do_madvise() will therefore continue to update VMAs
>> from the "obsolete" end (i.e., the one before the split).
>> 
>> Propagate the changes to end from madvise_dontneed_free() back to
>> do_madvise() and continue the updates from the new end accordingly.
> 
> Could you describe in details a race that would lead to wrong behaviour?

Thanks for the quick response.

For instance, madvise(MADV_DONTNEED) can race with mprotect() and cause
the VMA to split.

Something like:

  CPU0				CPU1
  ----				----
  madvise(0x10000, 0x2000, MADV_DONTNEED)
  -> userfaultfd_remove()
   [ mmap-lock dropped ]
				mprotect(0x11000, 0x1000, PROT_READ)
				[splitting the VMA]

				read(uffd)
				[unblocking userfaultfd_remove()]

   [ resuming ]
   end = vma->vm_end
   [end == 0x11000]

   madvise_dontneed_single_vma(vma, 0x10000, 0x11000)

  Following this operation, 0x11000-0x12000 would not be zapped.


> If mmap lock was dropped any change to VMA layout can appear. We can have
> totally unrelated VMA there.

Yes, but we are not talking about completely unrelated VMAs. If
userspace registered a region to be monitored using userfaultfd,
it expects this region to be handled as any other region. This is
a change of behavior that only affects regions with uffd.

The comment in the code explicitly says that this scenario should be
handled:

                        /*
                         * Don't fail if end > vma->vm_end. If the old
                         * vma was split while the mmap_lock was
                         * released the effect of the concurrent
                         * operation may not cause madvise() to
                         * have an undefined result. There may be an
                         * adjacent next vma that we'll walk
                         * next. userfaultfd_remove() will generate an
                         * UFFD_EVENT_REMOVE repetition on the
                         * end-vma->vm_end range, but the manager can
                         * handle a repetition fine.
                         */

Unless I am missing something, this does not happen in the current
code.

> 
> Either way, if userspace change VMA layout for a region that is under
> madvise(MADV_DONTNEED) it is totally broken. I don't see a valid reason to
> do this.
> 
> The current behaviour looks reasonable to me. Yes, we can miss VMAs, but
> these VMAs can also be created just after madvise() is finished.

Again, we are not talking about newly created VMAs.

Alternatively, this comment should be removed and perhaps the
documentation should be updated.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ