linux-kernel - Re: [PATCH v9 07/17] mm: allow vma_start_read_locked/vma_start_read_locked

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <038aaebd-264a-4e64-8777-4c4015401097@lucifer.local>
Date: Mon, 13 Jan 2025 15:25:24 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: akpm@...ux-foundation.org, peterz@...radead.org, willy@...radead.org,
        liam.howlett@...cle.com, david.laight.linux@...il.com, mhocko@...e.com,
        vbabka@...e.cz, hannes@...xchg.org, mjguzik@...il.com,
        oliver.sang@...el.com, mgorman@...hsingularity.net, david@...hat.com,
        peterx@...hat.com, oleg@...hat.com, dave@...olabs.net,
        paulmck@...nel.org, brauner@...nel.org, dhowells@...hat.com,
        hdanton@...a.com, hughd@...gle.com, lokeshgidra@...gle.com,
        minchan@...gle.com, jannh@...gle.com, shakeel.butt@...ux.dev,
        souravpanda@...gle.com, pasha.tatashin@...een.com,
        klarasmodin@...il.com, richard.weiyang@...il.com, corbet@....net,
        linux-doc@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH v9 07/17] mm: allow
 vma_start_read_locked/vma_start_read_locked_nested to fail

On Fri, Jan 10, 2025 at 08:25:54PM -0800, Suren Baghdasaryan wrote:
> With upcoming replacement of vm_lock with vm_refcnt, we need to handle a
> possibility of vma_start_read_locked/vma_start_read_locked_nested failing
> due to refcount overflow. Prepare for such possibility by changing these
> APIs and adjusting their users.
>
> Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
> Acked-by: Vlastimil Babka <vbabka@...e.cz>
> Cc: Lokesh Gidra <lokeshgidra@...gle.com>
> ---
>  include/linux/mm.h |  6 ++++--
>  mm/userfaultfd.c   | 18 +++++++++++++-----
>  2 files changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2f805f1a0176..cbb4e3dbbaed 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -747,10 +747,11 @@ static inline bool vma_start_read(struct vm_area_struct *vma)
>   * not be used in such cases because it might fail due to mm_lock_seq overflow.
>   * This functionality is used to obtain vma read lock and drop the mmap read lock.
>   */
> -static inline void vma_start_read_locked_nested(struct vm_area_struct *vma, int subclass)
> +static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma, int subclass)
>  {
>  	mmap_assert_locked(vma->vm_mm);
>  	down_read_nested(&vma->vm_lock.lock, subclass);
> +	return true;
>  }
>
>  /*
> @@ -759,10 +760,11 @@ static inline void vma_start_read_locked_nested(struct vm_area_struct *vma, int
>   * not be used in such cases because it might fail due to mm_lock_seq overflow.
>   * This functionality is used to obtain vma read lock and drop the mmap read lock.
>   */
> -static inline void vma_start_read_locked(struct vm_area_struct *vma)
> +static inline bool vma_start_read_locked(struct vm_area_struct *vma)
>  {
>  	mmap_assert_locked(vma->vm_mm);
>  	down_read(&vma->vm_lock.lock);
> +	return true;
>  }
>
>  static inline void vma_end_read(struct vm_area_struct *vma)
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 4527c385935b..411a663932c4 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -85,7 +85,8 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_struct *mm,
>  	mmap_read_lock(mm);
>  	vma = find_vma_and_prepare_anon(mm, address);
>  	if (!IS_ERR(vma))
> -		vma_start_read_locked(vma);
> +		if (!vma_start_read_locked(vma))
> +			vma = ERR_PTR(-EAGAIN);

Nit but this kind of reads a bit weirdly now:

	if (!IS_ERR(vma))
		if (!vma_start_read_locked(vma))
			vma = ERR_PTR(-EAGAIN);

Wouldn't this be nicer as:

	if (!IS_ERR(vma) && !vma_start_read_locked(vma))
		vma = ERR_PTR(-EAGAIN);

On the other hand, this embeds an action in an expression, but then it sort of
still looks weird.

	if (!IS_ERR(vma)) {
		bool ok = vma_start_read_locked(vma);

		if (!ok)
			vma = ERR_PTR(-EAGAIN);
	}

This makes me wonder, now yes, we are truly bikeshedding, sorry, but maybe we
could just have vma_start_read_locked return a VMA pointer that could be an
error?

Then this becomes:

	if (!IS_ERR(vma))
		vma = vma_start_read_locked(vma);

>
>  	mmap_read_unlock(mm);
>  	return vma;
> @@ -1483,10 +1484,17 @@ static int uffd_move_lock(struct mm_struct *mm,
>  	mmap_read_lock(mm);
>  	err = find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap);
>  	if (!err) {
> -		vma_start_read_locked(*dst_vmap);
> -		if (*dst_vmap != *src_vmap)
> -			vma_start_read_locked_nested(*src_vmap,
> -						SINGLE_DEPTH_NESTING);
> +		if (vma_start_read_locked(*dst_vmap)) {
> +			if (*dst_vmap != *src_vmap) {
> +				if (!vma_start_read_locked_nested(*src_vmap,
> +							SINGLE_DEPTH_NESTING)) {
> +					vma_end_read(*dst_vmap);

Hmm, why do we end read if the lock failed here but not above?

> +					err = -EAGAIN;
> +				}
> +			}
> +		} else {
> +			err = -EAGAIN;
> +		}
>  	}

This whole block is really ugly now, this really needs refactoring.

How about (on assumption the vma_end_read() is correct):


	err = find_vmas_mm_locked(mm, dst_start, src_start, dst_vmap, src_vmap);
	if (err)
		goto out;

	if (!vma_start_read_locked(*dst_vmap)) {
		err = -EAGAIN;
		goto out;
	}

	/* Nothing further to do. */
	if (*dst_vmap == *src_vmap)
		goto out;

	if (!vma_start_read_locked_nested(*src_vmap,
				SINGLE_DEPTH_NESTING)) {
		vma_end_read(*dst_vmap);
		err = -EAGAIN;
	}

out:
	mmap_read_unlock(mm);
	return err;
}

>  	mmap_read_unlock(mm);
>  	return err;
> --
> 2.47.1.613.gc27f4b7a9f-goog
>