lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZteMZEsMCMig8zOQ@google.com>
Date: Tue, 3 Sep 2024 22:23:32 +0000
From: Carlos Llamas <cmllamas@...gle.com>
To: Barry Song <21cnbao@...il.com>
Cc: Hillf Danton <hdanton@...a.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Barry Song <v-songbaohua@...o.com>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Michal Hocko <mhocko@...e.com>,
	Tangquan Zheng <zhengtangquan@...o.com>
Subject: Re: [PATCH] binder_alloc: Move alloc_page() out of mmap_rwsem to
 reduce the lock duration

On Wed, Sep 04, 2024 at 09:10:20AM +1200, Barry Song wrote:
> However, I'm not entirely convinced that this is a problem :-) Concurrent
> allocations like this can occur in many places, especially in PFs. Reclamation
> is not useless because it helps free up memory for others; it's not
> without value.

Yeah, for binder though there is a high chance of multiple callers
trying to allocate the _same_ page concurrently. What I observed is that
pages being reclaimed "in vain" are often in the same vma and this means
subsequent callers will need to allocate these pages.

But even if this wasn't an issue, the correct solution would still be to
support concurrent faults. So, in reality it doesn't matter and we still
need to refactor the call to be non-exclusive.

> I also don't believe binder is one of the largest users executing concurrent
> allocations.

Oh, I have no statistics on this.

> Awesome! I’m eager to see your patch, and we’re ready to help with testing.
> I strongly recommend dropping the write lock entirely. Using
> `mmap_write_lock()` isn’t just a binder-specific concern; it has the
> potential to affect the entire Android system.
> 
> In patch 3, I experimented with using `per_vma_lock` as well. I’m _not_
> proposing it for merging since you’re already working on it, but I wanted
> to share the idea. (just like patch2, it has only passed build-test)

Yeap, I'm using the per-vma-lock too per Suren's suggestion.

> 
> [PATCH] binder_alloc: Further move to per_vma_lock from mmap_read_lock
> 
> To further reduce the read lock duration, let's try using per_vma_lock
> first. If that fails, we can take the read lock, similar to how page
> fault handlers operate.
> 
> Signed-off-by: Barry Song <v-songbaohua@...o.com>
> ---
>  drivers/android/binder_alloc.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index a2281dfacbbc..b40a5dd650c8 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -221,6 +221,8 @@ static int binder_install_single_page(struct
> binder_alloc *alloc,
>                                       struct binder_lru_page *lru_page,
>                                       unsigned long addr)
>  {
> +       struct vm_area_struct *vma;
> +       bool per_vma_lock = true;
>         struct page *page;
>         int ret = 0;
> 
> @@ -235,10 +237,15 @@ static int binder_install_single_page(struct
> binder_alloc *alloc,
>          */
>         page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
> 
> -       mmap_read_lock(alloc->mm);
> +       vma = lock_vma_under_rcu(alloc->mm, addr);
> +       if (!vma) {
> +               per_vma_lock = false;
> +               mmap_read_lock(alloc->mm);
> +               vma = find_vma(alloc->mm, addr);
> +       }
> 
> -       /* vma might have been dropped or deattached */
> -       if (!alloc->vma || !find_vma(alloc->mm, addr)) {
> +       /* vma might have been dropped, deattached or changed to new one */
> +       if (!alloc->vma || !vma || vma != alloc->vma) {
>                 pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
>                 ret = -ESRCH;
>                 goto out;
> @@ -270,7 +277,10 @@ static int binder_install_single_page(struct
> binder_alloc *alloc,
>         binder_set_installed_page(lru_page, page);
>         spin_unlock(&alloc->lock);
>  out:
> -       mmap_read_unlock(alloc->mm);
> +       if (per_vma_lock)
> +               vma_end_read(vma);
> +       else
> +               mmap_read_unlock(alloc->mm);
>         mmput_async(alloc->mm);
>         if (ret && page)
>                 __free_page(page);
> -- 
> 2.39.3 (Apple Git-146)

This looks fairly similar to my patch. However, you would need to handle
the case were vm_insert_page() returns -EBUSY (success that raced) and
also sync with the shrinker callbacks in binder_alloc_free_page() which
is the biggest concern.

Let's not duplicate efforts though. Can we please wait for my patch?
I'll add you as Co-Developed-by, since you've posted this already?

Regards,
Carlos Llamas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ