lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4xr-HvqKdh=Q=sVKM+hM+VS1Cf4gqPvq9vDtnQSBO9X=A@mail.gmail.com>
Date: Tue, 3 Sep 2024 19:45:12 +0800
From: Barry Song <21cnbao@...il.com>
To: Hillf Danton <hdanton@...a.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Barry Song <v-songbaohua@...o.com>, 
	Carlos Llamas <cmllamas@...gle.com>, Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>, 
	Tangquan Zheng <zhengtangquan@...o.com>
Subject: Re: [PATCH] binder_alloc: Move alloc_page() out of mmap_rwsem to
 reduce the lock duration

On Tue, Sep 3, 2024 at 7:01 PM Hillf Danton <hdanton@...a.com> wrote:
>
> On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote:
> > From: Barry Song <v-songbaohua@...o.com>
> >
> > The mmap_write_lock() can block all access to the VMAs, for example page
> > faults. Performing memory allocation while holding this lock may trigger
> > direct reclamation, leading to others being queued in the rwsem for an
> > extended period.
> > We've observed that the allocation can sometimes take more than 300ms,
> > significantly blocking other threads. The user interface sometimes
> > becomes less responsive as a result. To prevent this, let's move the
> > allocation outside of the write lock.
>
> I suspect concurrent allocators make things better wrt response, cutting
> alloc latency down to 10ms for instance in your scenario. Feel free to
> show figures given Tangquan's 48-hour profiling.

Likely.

Concurrent allocators are quite common in PFs which occur
in the same PTE. whoever gets PTL sets PTE, others free the allocated
pages.

>
> > A potential side effect could be an extra alloc_page() for the second
> > thread executing binder_install_single_page() while the first thread
> > has done it earlier. However, according to Tangquan's 48-hour profiling
> > using monkey, the likelihood of this occurring is minimal, with a ratio
> > of only 1 in 2400. Compared to the significantly costly rwsem, this is
> > negligible.
> > On the other hand, holding a write lock without making any VMA
> > modifications appears questionable and likely incorrect. While this
> > patch focuses on reducing the lock duration, future updates may aim
> > to eliminate the write lock entirely.
>
> If spin, better not before taking a look at vm_insert_page().

I have patch 2/3 transitioning to mmap_read_lock, and per_vma_lock is
currently in the
testing queue. At the moment, alloc->spin is in place, but I'm not
entirely convinced
it's the best replacement for the write lock. Let's wait for
Tangquan's test results.

Patch 2 is detailed below, but it has only passed the build-test phase
so far, so
its result is uncertain. I'm sharing it early in case you find it
interesting. And I
am not convinced Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in
race with munmap()") is a correct fix to really avoid all UAF of alloc->vma.

[PATCH]  binder_alloc: Don't use mmap_write_lock for installing page

Commit d1d8875c8c13 ("binder: fix UAF of alloc->vma in race with
munmap()") uses the mmap_rwsem write lock to protect against a race
condition with munmap, where the vma is detached by the write lock,
but pages are zapped by the read lock. This approach is extremely
expensive for the system, though perhaps less so for binder itself,
as the write lock can block all other operations.

As an alternative, we could hold only the read lock and re-check
that the vma hasn't been detached. To protect simultaneous page
installation, we could use alloc->lock instead.

Signed-off-by: Barry Song <v-songbaohua@...o.com>
---
 drivers/android/binder_alloc.c | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index f20074e23a7c..a2281dfacbbc 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -228,24 +228,17 @@ static int binder_install_single_page(struct
binder_alloc *alloc,
                return -ESRCH;

        /*
-        * Don't allocate page in mmap_write_lock, this can block
-        * mmap_rwsem for a long time; Meanwhile, allocation failure
-        * doesn't necessarily need to return -ENOMEM, if lru_page
-        * has been installed, we can still return 0(success).
+        * Allocation failure doesn't necessarily need to return -ENOMEM,
+        * if lru_page has been installed, we can still return 0(success).
+        * So, defer the !page check until after binder_get_installed_page()
+        * is completed.
         */
        page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);

-       /*
-        * Protected with mmap_sem in write mode as multiple tasks
-        * might race to install the same page.
-        */
-       mmap_write_lock(alloc->mm);
-       if (binder_get_installed_page(lru_page)) {
-               ret = 1;
-               goto out;
-       }
+       mmap_read_lock(alloc->mm);

-       if (!alloc->vma) {
+       /* vma might have been dropped or deattached */
+       if (!alloc->vma || !find_vma(alloc->mm, addr)) {
                pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
                ret = -ESRCH;
                goto out;
@@ -257,18 +250,27 @@ static int binder_install_single_page(struct
binder_alloc *alloc,
                goto out;
        }

+       spin_lock(&alloc->lock);
+       if (binder_get_installed_page(lru_page)) {
+               spin_unlock(&alloc->lock);
+               ret = 1;
+               goto out;
+       }
+
        ret = vm_insert_page(alloc->vma, addr, page);
        if (ret) {
                pr_err("%d: %s failed to insert page at offset %lx with %d\n",
                       alloc->pid, __func__, addr - alloc->buffer, ret);
+               spin_unlock(&alloc->lock);
                ret = -ENOMEM;
                goto out;
        }

        /* Mark page installation complete and safe to use */
        binder_set_installed_page(lru_page, page);
+       spin_unlock(&alloc->lock);
 out:
-       mmap_write_unlock(alloc->mm);
+       mmap_read_unlock(alloc->mm);
        mmput_async(alloc->mm);
        if (ret && page)
                __free_page(page);
--
2.39.3 (Apple Git-146)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ