[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4ybZRudJ+p7pxgb1xH7HP0rKcWW1Dtr_kvb7EUwnqxsQQ@mail.gmail.com>
Date: Tue, 3 Sep 2024 17:07:23 +0800
From: Barry Song <21cnbao@...il.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Barry Song <v-songbaohua@...o.com>,
Arve Hjønnevåg <arve@...roid.com>,
Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
Joel Fernandes <joel@...lfernandes.org>, Christian Brauner <brauner@...nel.org>,
Carlos Llamas <cmllamas@...gle.com>, Suren Baghdasaryan <surenb@...gle.com>,
Tangquan Zheng <zhengtangquan@...o.com>
Subject: Re: [PATCH] binder_alloc: Move alloc_page() out of mmap_rwsem to
reduce the lock duration
On Tue, Sep 3, 2024 at 4:57 PM Greg Kroah-Hartman
<gregkh@...uxfoundation.org> wrote:
>
> On Tue, Sep 03, 2024 at 10:50:09AM +1200, Barry Song wrote:
> > From: Barry Song <v-songbaohua@...o.com>
> >
> > The mmap_write_lock() can block all access to the VMAs, for example page
> > faults. Performing memory allocation while holding this lock may trigger
> > direct reclamation, leading to others being queued in the rwsem for an
> > extended period.
> > We've observed that the allocation can sometimes take more than 300ms,
> > significantly blocking other threads. The user interface sometimes
> > becomes less responsive as a result. To prevent this, let's move the
> > allocation outside of the write lock.
> > A potential side effect could be an extra alloc_page() for the second
> > thread executing binder_install_single_page() while the first thread
> > has done it earlier. However, according to Tangquan's 48-hour profiling
> > using monkey, the likelihood of this occurring is minimal, with a ratio
> > of only 1 in 2400. Compared to the significantly costly rwsem, this is
> > negligible.
> > On the other hand, holding a write lock without making any VMA
> > modifications appears questionable and likely incorrect. While this
> > patch focuses on reducing the lock duration, future updates may aim
> > to eliminate the write lock entirely.
> >
> > Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> > Cc: "Arve Hjønnevåg" <arve@...roid.com>
> > Cc: Todd Kjos <tkjos@...roid.com>
> > Cc: Martijn Coenen <maco@...roid.com>
> > Cc: Joel Fernandes <joel@...lfernandes.org>
> > Cc: Christian Brauner <brauner@...nel.org>
> > Cc: Carlos Llamas <cmllamas@...gle.com>
> > Cc: Suren Baghdasaryan <surenb@...gle.com>
> > Tested-by: Tangquan Zheng <zhengtangquan@...o.com>
> > Signed-off-by: Barry Song <v-songbaohua@...o.com>
> > ---
> > drivers/android/binder_alloc.c | 18 ++++++++++++++----
> > 1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > index b3acbc4174fb..f20074e23a7c 100644
> > --- a/drivers/android/binder_alloc.c
> > +++ b/drivers/android/binder_alloc.c
> > @@ -227,13 +227,23 @@ static int binder_install_single_page(struct binder_alloc *alloc,
> > if (!mmget_not_zero(alloc->mm))
> > return -ESRCH;
> >
> > + /*
> > + * Don't allocate page in mmap_write_lock, this can block
> > + * mmap_rwsem for a long time; Meanwhile, allocation failure
> > + * doesn't necessarily need to return -ENOMEM, if lru_page
> > + * has been installed, we can still return 0(success).
> > + */
> > + page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
>
> But now you are allocating new pages even if binder_get_installed_page()
> is an error, right? Doesn't that slow things down?
very very unlikely, as the ratio is only 1/2400 while write lock 100% blocks
everyone.
>
> How was this benchmarked?
>
i actually put Tangquan's profiling result:
"
However, according to Tangquan's 48-hour profiling
using monkey, the likelihood of this occurring is minimal, with a ratio
of only 1 in 2400. Compared to the significantly costly rwsem, this is
negligible."
> > +
> > /*
> > * Protected with mmap_sem in write mode as multiple tasks
> > * might race to install the same page.
> > */
> > mmap_write_lock(alloc->mm);
> > - if (binder_get_installed_page(lru_page))
> > + if (binder_get_installed_page(lru_page)) {
> > + ret = 1;
>
> That is not a valid error value :(
>
> > goto out;
> > + }
> >
> > if (!alloc->vma) {
> > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> > @@ -241,7 +251,6 @@ static int binder_install_single_page(struct binder_alloc *alloc,
> > goto out;
> > }
> >
> > - page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
> > if (!page) {
> > pr_err("%d: failed to allocate page\n", alloc->pid);
> > ret = -ENOMEM;
> > @@ -252,7 +261,6 @@ static int binder_install_single_page(struct binder_alloc *alloc,
> > if (ret) {
> > pr_err("%d: %s failed to insert page at offset %lx with %d\n",
> > alloc->pid, __func__, addr - alloc->buffer, ret);
> > - __free_page(page);
> > ret = -ENOMEM;
> > goto out;
> > }
> > @@ -262,7 +270,9 @@ static int binder_install_single_page(struct binder_alloc *alloc,
> > out:
> > mmap_write_unlock(alloc->mm);
> > mmput_async(alloc->mm);
> > - return ret;
> > + if (ret && page)
> > + __free_page(page);
> > + return ret < 0 ? ret : 0;
>
> Please only use ? : for when you have to, otherwise please spell it out
> with a normal if statement:
> if (ret < 0)
> return ret;
> return 0;
>
> But, this is abusing the fact that you set "ret = 1" above, which is
> going to trip someone up in the future as that is NOT a normal coding
> pattern we have in the kernel, sorry.
>
> If you insist on this change, please rework it to not have that type of
> "positive means one thing, 0 means another, and negative means yet
> something else" please.
I was trying to consolidate all free_page() calls into one place. Otherwise,
we would need multiple free_page() calls. I'm perfectly fine with having more
free_page() calls in both the ret = 1 and ret < 0 paths. In that case,
the ret = 1
path can be removed if you prefer.
>
> thanks,
>
> greg k-h
Thanks
Barry
Powered by blists - more mailing lists