lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xpzec7tqe43sykvqtgrlh3furu7vn2nrnkjmv7odzy7ywd4lf6@hlawbgapxcfk>
Date: Fri, 8 Nov 2024 15:41:41 -0500
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Carlos Llamas <cmllamas@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Arve Hjønnevåg <arve@...roid.com>,
        Todd Kjos <tkjos@...roid.com>, Martijn Coenen <maco@...roid.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Christian Brauner <brauner@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, linux-kernel@...r.kernel.org,
        kernel-team@...roid.com, David Hildenbrand <david@...hat.com>,
        Barry Song <v-songbaohua@...o.com>
Subject: Re: [PATCH v3 2/8] binder: concurrent page installation

* Carlos Llamas <cmllamas@...gle.com> [241108 14:11]:
> Allow multiple callers to install pages simultaneously by downgrading
> the mmap_sem to non-exclusive mode.

Since we actually allow downgrading of a lock, it would be more clear to
say that you are changing from a read lock to a write lock.  I was
searching for the lock downgrade in this patch :)

>Races to the same PTE are handled
> using get_user_pages_remote() to retrieve the already installed page.
> This method significantly reduces contention in the mmap semaphore.
> 
> To ensure safety, vma_lookup() is used (instead of alloc->vma) to avoid
> operating on an isolated VMA. In addition, zap_page_range_single() is
> called under the alloc->mutex to avoid racing with the shrinker.
> 
> Many thanks to Barry Song who posted a similar approach [1].
> 
> Link: https://lore.kernel.org/all/20240902225009.34576-1-21cnbao@gmail.com/ [1]
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Barry Song <v-songbaohua@...o.com>
> Cc: Suren Baghdasaryan <surenb@...gle.com>
> Cc: Liam R. Howlett <Liam.Howlett@...cle.com>
> Reviewed-by: Suren Baghdasaryan <surenb@...gle.com>
> Signed-off-by: Carlos Llamas <cmllamas@...gle.com>
> ---
>  drivers/android/binder_alloc.c | 64 +++++++++++++++++++++-------------
>  1 file changed, 40 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 7241bf4a3ff2..2ab520c285b3 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -221,26 +221,14 @@ static int binder_install_single_page(struct binder_alloc *alloc,
>  				      struct binder_lru_page *lru_page,
>  				      unsigned long addr)
>  {
> +	struct vm_area_struct *vma;
>  	struct page *page;
> -	int ret = 0;
> +	long npages;
> +	int ret;
>  
>  	if (!mmget_not_zero(alloc->mm))
>  		return -ESRCH;
>  
> -	/*
> -	 * Protected with mmap_sem in write mode as multiple tasks
> -	 * might race to install the same page.
> -	 */
> -	mmap_write_lock(alloc->mm);
> -	if (binder_get_installed_page(lru_page))
> -		goto out;
> -
> -	if (!alloc->vma) {
> -		pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> -		ret = -ESRCH;
> -		goto out;
> -	}
> -
>  	page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
>  	if (!page) {
>  		pr_err("%d: failed to allocate page\n", alloc->pid);
> @@ -248,19 +236,47 @@ static int binder_install_single_page(struct binder_alloc *alloc,
>  		goto out;
>  	}
>  
> -	ret = vm_insert_page(alloc->vma, addr, page);
> -	if (ret) {
> +	mmap_read_lock(alloc->mm);

Ah, I debate bring this up, but you can do this without the mmap read
lock.  You can use rcu and the per-vma locking like in the page fault
path.  If you look at how that is done using the lock_vma_under_rcu()
(mm/memory.c) then you can see that pages are installed today without
the mmap locking (on some architectures - which includes x86_64 and
arm64).

userfaultfd had to do something like this as well, and created some
abstraction to facilitate either mmap or rcu, based on the arch support.
Going to rcu really helped performance there [1].  There was also a
chance of the vma being missed, so it is checked again under the mmap
read lock, but realistically that never happens and exists to ensure
correctness.

You also mention the shrinker and using the alloc->mutex, well zapping
pages can also be done under the vma specific lock - if that helps?

Freeing page tables is different though, that needs more locking, but I
don't think this is an issue for you.

[1]. https://lore.kernel.org/all/20240215182756.3448972-1-lokeshgidra@google.com/

> +	vma = vma_lookup(alloc->mm, addr);

Thank you for not saving a pointer anymore.

> +	if (!vma || vma != alloc->vma) {
> +		__free_page(page);
> +		pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> +		ret = -ESRCH;
> +		goto unlock;
> +	}
> +
> +	ret = vm_insert_page(vma, addr, page);
> +	switch (ret) {
> +	case -EBUSY:
> +		/*
> +		 * EBUSY is ok. Someone installed the pte first but the
> +		 * lru_page->page_ptr has not been updated yet. Discard
> +		 * our page and look up the one already installed.
> +		 */
> +		ret = 0;
> +		__free_page(page);
> +		npages = get_user_pages_remote(alloc->mm, addr, 1, 0, &page, NULL);
> +		if (npages <= 0) {
> +			pr_err("%d: failed to find page at offset %lx\n",
> +			       alloc->pid, addr - alloc->buffer);
> +			ret = -ESRCH;
> +			break;
> +		}
> +		fallthrough;
> +	case 0:
> +		/* Mark page installation complete and safe to use */
> +		binder_set_installed_page(lru_page, page);
> +		break;
> +	default:
> +		__free_page(page);
>  		pr_err("%d: %s failed to insert page at offset %lx with %d\n",
>  		       alloc->pid, __func__, addr - alloc->buffer, ret);
> -		__free_page(page);
>  		ret = -ENOMEM;
> -		goto out;
> +		break;
>  	}
> -
> -	/* Mark page installation complete and safe to use */
> -	binder_set_installed_page(lru_page, page);
> +unlock:
> +	mmap_read_unlock(alloc->mm);
>  out:
> -	mmap_write_unlock(alloc->mm);
>  	mmput_async(alloc->mm);
>  	return ret;
>  }
> @@ -1091,7 +1107,6 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
>  	trace_binder_unmap_kernel_end(alloc, index);
>  
>  	list_lru_isolate(lru, item);
> -	mutex_unlock(&alloc->mutex);
>  	spin_unlock(lock);
>  
>  	if (vma) {
> @@ -1102,6 +1117,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
>  		trace_binder_unmap_user_end(alloc, index);
>  	}
>  
> +	mutex_unlock(&alloc->mutex);
>  	mmap_read_unlock(mm);
>  	mmput_async(mm);
>  	__free_page(page_to_free);
> -- 
> 2.47.0.277.g8800431eea-goog
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ