lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7fd9d523-d331-498d-8b67-2b525c0de37d@lucifer.local>
Date: Mon, 23 Jun 2025 17:56:19 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        David Hildenbrand <david@...hat.com>, Jann Horn <jannh@...gle.com>,
        Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Colin Cross <ccross@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling

On Mon, Jun 23, 2025 at 04:59:50PM +0200, Vlastimil Babka wrote:
> Since the introduction in 9a10064f5625 ("mm: add a field to store names
> for private anonymous memory") the code to set anon_name on a vma has
> been using madvise_update_vma() to call replace_vma_anon_name(). Since
> the former is called also by a number of other madvise behaviours that
> do not set a new anon_name, they have been passing the existing
> anon_name of the vma to make replace_vma_anon_name() a no-op.
>
> This is rather wasteful as it needs anon_vma_name_eq() to determine the
> no-op situations, and checks for when replace_vma_anon_name() is allowed
> (the vma is anon/shmem) duplicate the checks already done earlier in
> madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm:
> fix use-after-free when anon vma name is used after vma is freed")
> adding anon_name refcount get/put operations exactly to the cases that
> actually do not change anon_name - just so the replace_vma_anon_name()
> can keep safely determining it has nothing to do.
>
> The recent madvise cleanups made this suboptimal handling very obvious,
> but happily also allow for an easy fix. madvise_update_vma() now has the
> complete information whether it's been called to set a new anon_name, so
> stop passing it the existing vma's name and doing the refcount get/put
> in its only caller madvise_vma_behavior().
>
> In madvise_update_vma() itself, limit calling of replace_anon_vma_name()
> only to cases where we are setting a new name, otherwise we know it's a
> no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and
> can remove the duplicate checks for vma being anon/shmem that were done
> already in madvise_vma_behavior().
>
> The remaining reason to obtain the vma's existing anon_name is to pass
> it to vma_modify_flags_name() for the splitting and merging to work
> properly. In case of merging, the vma might be freed along with the
> anon_name, but madvise_update_vma() will not access it afterwards so the
> UAF previously fixed by commit 942341dcc574 is not reintroduced.
>
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>

I very much love what you're doing here, but wonder if we could further simplify
even, see below...

> ---
>  mm/madvise.c | 37 +++++++++++++------------------------
>  1 file changed, 13 insertions(+), 24 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4491bf080f55d6d1aeffb2ff0b8fdd28904af950..ae29395b4fc7f65a449c5772b1901a90f4195885 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -176,21 +176,25 @@ static int replace_anon_vma_name(struct vm_area_struct *vma,
>  }
>  #endif /* CONFIG_ANON_VMA_NAME */
>  /*
> - * Update the vm_flags on region of a vma, splitting it or merging it as
> - * necessary.  Must be called with mmap_lock held for writing;
> - * Caller should ensure anon_name stability by raising its refcount even when
> - * anon_name belongs to a valid vma because this function might free that vma.
> + * Update the vm_flags and/or anon_name on region of a vma, splitting it or
> + * merging it as necessary. Must be called with mmap_lock held for writing.
>   */
>  static int madvise_update_vma(vm_flags_t new_flags,
>  		struct madvise_behavior *madv_behavior)
>  {
> -	int error;
>  	struct vm_area_struct *vma = madv_behavior->vma;
>  	struct madvise_behavior_range *range = &madv_behavior->range;
> -	struct anon_vma_name *anon_name = madv_behavior->anon_name;
> +	bool set_new_anon_name = madv_behavior->behavior == __MADV_SET_ANON_VMA_NAME;
> +	struct anon_vma_name *anon_name;
>  	VMA_ITERATOR(vmi, madv_behavior->mm, range->start);
>
> -	if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name))
> +	if (set_new_anon_name)
> +		anon_name = madv_behavior->anon_name;
> +	else
> +		anon_name = anon_vma_name(vma);

I think we can actually avoid this altogether... So we could separate this into two functions:

tatic int madvise_update_vma_anon_name(struct madvise_behavior *madv_behavior)
{
	struct vm_area_struct *vma = madv_behavior->vma;
	VMA_ITERATOR(vmi, madv_behavior->mm, range->start);
	struct madvise_behavior_range *range = &madv_behavior->range;
	struct anon_vma_name *anon_name = madv_behavior->anon_name;

	if (anon_vma_name_eq(anon_vma_name(vma), anon_name))
		rturn 0;

	vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma,
			range->start, range->end, vma->vm_flags, anon_name);
	if (IS_ERR(vma))
		return PTR_ERR(vma);

	madv_behavior->vma = vma;

	/* vm_flags is protected by the mmap_lock held in write mode. */
	vma_start_write(vma);
	return replace_anon_vma_name(vma, anon_name);
}

/*
 * Update the vm_flags and/or anon_name on region of a vma, splitting it or
 * merging it as necessary. Must be called with mmap_lock held for writing.
 */
static int madvise_update_vma(vm_flags_t new_flags,
		struct madvise_behavior *madv_behavior)
{
	struct vm_area_struct *vma = madv_behavior->vma;
	struct madvise_behavior_range *range = &madv_behavior->range;
	VMA_ITERATOR(vmi, madv_behavior->mm, range->start);

	if (new_flags == vma->vm_flags)
		return 0;

	vma = vma_modify_flags(&vmi, madv_behavior->prev, vma,
			range->start, range->end, new_flags);
	if (IS_ERR(vma))
		return PTR_ERR(vma);

	madv_behavior->vma = vma;

	/* vm_flags is protected by the mmap_lock held in write mode. */
	vma_start_write(vma);
	vm_flags_reset(vma, new_flags);
	return 0;
}

And avoid all the anon_vma_name stuff for the flags change altogether. This
should reduce confusion at the cost of some small duplication.

Note that we can then put the anon vma name version in an #ifdef
CONFIG_ANON_VMA_NAME block also.


> +
> +	if (new_flags == vma->vm_flags && (!set_new_anon_name
> +			|| anon_vma_name_eq(anon_vma_name(vma), anon_name)))
>  		return 0;
>
>  	vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma,

So maybe just replace this with:

if (set_new_anon_name

> @@ -203,11 +207,8 @@ static int madvise_update_vma(vm_flags_t new_flags,
>  	/* vm_flags is protected by the mmap_lock held in write mode. */
>  	vma_start_write(vma);
>  	vm_flags_reset(vma, new_flags);
> -	if (!vma->vm_file || vma_is_anon_shmem(vma)) {
> -		error = replace_anon_vma_name(vma, anon_name);
> -		if (error)
> -			return error;
> -	}
> +	if (set_new_anon_name)
> +		return replace_anon_vma_name(vma, anon_name);
>
>  	return 0;
>  }
> @@ -1313,7 +1314,6 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
>  	int behavior = madv_behavior->behavior;
>  	struct vm_area_struct *vma = madv_behavior->vma;
>  	vm_flags_t new_flags = vma->vm_flags;
> -	bool set_new_anon_name = behavior == __MADV_SET_ANON_VMA_NAME;
>  	struct madvise_behavior_range *range = &madv_behavior->range;
>  	int error;
>
> @@ -1403,18 +1403,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
>  	/* This is a write operation.*/
>  	VM_WARN_ON_ONCE(madv_behavior->lock_mode != MADVISE_MMAP_WRITE_LOCK);
>
> -	/*
> -	 * madvise_update_vma() might cause a VMA merge which could put an
> -	 * anon_vma_name, so we must hold an additional reference on the
> -	 * anon_vma_name so it doesn't disappear from under us.
> -	 */
> -	if (!set_new_anon_name) {
> -		madv_behavior->anon_name = anon_vma_name(vma);
> -		anon_vma_name_get(madv_behavior->anon_name);
> -	}
>  	error = madvise_update_vma(new_flags, madv_behavior);
> -	if (!set_new_anon_name)
> -		anon_vma_name_put(madv_behavior->anon_name);

Obviously I'll leave it to you as to how you'd update this to reflect that? :P

>  out:
>  	/*
>  	 * madvise() returns EAGAIN if kernel resources, such as
>
> --
> 2.50.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ