linux-kernel - Re: [PATCH] binder_alloc: Move alloc_page() out of mmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wuKpk+U2K+o_zwYXtyR=qsnKxadfkm39rsaKkJxY-j5A@mail.gmail.com>
Date: Wed, 4 Sep 2024 10:43:29 +1200
From: Barry Song <21cnbao@...il.com>
To: Carlos Llamas <cmllamas@...gle.com>
Cc: Hillf Danton <hdanton@...a.com>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Barry Song <v-songbaohua@...o.com>, 
	Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>, 
	Tangquan Zheng <zhengtangquan@...o.com>
Subject: Re: [PATCH] binder_alloc: Move alloc_page() out of mmap_rwsem to
 reduce the lock duration

On Wed, Sep 4, 2024 at 10:23 AM Carlos Llamas <cmllamas@...gle.com> wrote:
>
> On Wed, Sep 04, 2024 at 09:10:20AM +1200, Barry Song wrote:
> > However, I'm not entirely convinced that this is a problem :-) Concurrent
> > allocations like this can occur in many places, especially in PFs. Reclamation
> > is not useless because it helps free up memory for others; it's not
> > without value.
>
> Yeah, for binder though there is a high chance of multiple callers
> trying to allocate the _same_ page concurrently. What I observed is that
> pages being reclaimed "in vain" are often in the same vma and this means
> subsequent callers will need to allocate these pages.
>
> But even if this wasn't an issue, the correct solution would still be to
> support concurrent faults. So, in reality it doesn't matter and we still
> need to refactor the call to be non-exclusive.
>
> > I also don't believe binder is one of the largest users executing concurrent
> > allocations.
>
> Oh, I have no statistics on this.
>
> > Awesome! I’m eager to see your patch, and we’re ready to help with testing.
> > I strongly recommend dropping the write lock entirely. Using
> > `mmap_write_lock()` isn’t just a binder-specific concern; it has the
> > potential to affect the entire Android system.
> >
> > In patch 3, I experimented with using `per_vma_lock` as well. I’m _not_
> > proposing it for merging since you’re already working on it, but I wanted
> > to share the idea. (just like patch2, it has only passed build-test)
>
> Yeap, I'm using the per-vma-lock too per Suren's suggestion.
>
> >
> > [PATCH] binder_alloc: Further move to per_vma_lock from mmap_read_lock
> >
> > To further reduce the read lock duration, let's try using per_vma_lock
> > first. If that fails, we can take the read lock, similar to how page
> > fault handlers operate.
> >
> > Signed-off-by: Barry Song <v-songbaohua@...o.com>
> > ---
> >  drivers/android/binder_alloc.c | 18 ++++++++++++++----
> >  1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > index a2281dfacbbc..b40a5dd650c8 100644
> > --- a/drivers/android/binder_alloc.c
> > +++ b/drivers/android/binder_alloc.c
> > @@ -221,6 +221,8 @@ static int binder_install_single_page(struct
> > binder_alloc *alloc,
> >                                       struct binder_lru_page *lru_page,
> >                                       unsigned long addr)
> >  {
> > +       struct vm_area_struct *vma;
> > +       bool per_vma_lock = true;
> >         struct page *page;
> >         int ret = 0;
> >
> > @@ -235,10 +237,15 @@ static int binder_install_single_page(struct
> > binder_alloc *alloc,
> >          */
> >         page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
> >
> > -       mmap_read_lock(alloc->mm);
> > +       vma = lock_vma_under_rcu(alloc->mm, addr);
> > +       if (!vma) {
> > +               per_vma_lock = false;
> > +               mmap_read_lock(alloc->mm);
> > +               vma = find_vma(alloc->mm, addr);
> > +       }
> >
> > -       /* vma might have been dropped or deattached */
> > -       if (!alloc->vma || !find_vma(alloc->mm, addr)) {
> > +       /* vma might have been dropped, deattached or changed to new one */
> > +       if (!alloc->vma || !vma || vma != alloc->vma) {
> >                 pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> >                 ret = -ESRCH;
> >                 goto out;
> > @@ -270,7 +277,10 @@ static int binder_install_single_page(struct
> > binder_alloc *alloc,
> >         binder_set_installed_page(lru_page, page);
> >         spin_unlock(&alloc->lock);
> >  out:
> > -       mmap_read_unlock(alloc->mm);
> > +       if (per_vma_lock)
> > +               vma_end_read(vma);
> > +       else
> > +               mmap_read_unlock(alloc->mm);
> >         mmput_async(alloc->mm);
> >         if (ret && page)
> >                 __free_page(page);
> > --
> > 2.39.3 (Apple Git-146)
>
> This looks fairly similar to my patch. However, you would need to handle
> the case were vm_insert_page() returns -EBUSY (success that raced) and
> also sync with the shrinker callbacks in binder_alloc_free_page() which
> is the biggest concern.
>
> Let's not duplicate efforts though. Can we please wait for my patch?
> I'll add you as Co-Developed-by, since you've posted this already?

Yes, I’d be more than happy to wait for your patch, as I believe you have
much much more experience with binder.

>
> Regards,
> Carlos Llamas

Thanks
Barry