lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c0101ac-afac-4601-8be9-24587877a5e3@lucifer.local>
Date: Fri, 5 Sep 2025 15:58:42 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Yajun Deng <yajun.deng@...ux.dev>
Cc: akpm@...ux-foundation.org, david@...hat.com, riel@...riel.com,
        Liam.Howlett@...cle.com, vbabka@...e.cz, harry.yoo@...cle.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/rmap: make num_children and num_active_vmas update in
 internally

On Fri, Sep 05, 2025 at 01:20:19PM +0000, Yajun Deng wrote:
> If the anon_vma_alloc() is called, the num_children of the parent of
> the anon_vma will be updated. But this operation occurs outside of
> anon_vma_alloc().

I don't like that what is supposed to be an allocation function is now also
doing additional work.

And there's literally only one place where this matters, since
__anon_vma_prepare() doesn't have a parent, it's ltierally only anon_vma_fork().

And there are only 2 callers, so I don't see the purpose in doing this?

>
> The num_active_vmas are also updated outside of anon_vma.

I don't know what 'outside of anon_vma' means?


>
> Pass the parent of anon_vma to the anon_vma_alloc() and update the
> num_children inside it.
>
> Introduce anon_vma_attach() and anon_vma_detach() to update
> num_active_vmas with the anon_vma.

I really dislike this naming - VMA detachment means something entirely
different, you're not really 'attaching' or 'detaching' anything you're just
changing a single stat of the ones you'd need to change were you truly doing so
etc. etc.

It's misleading.

>
> Signed-off-by: Yajun Deng <yajun.deng@...ux.dev>

Did you test this at all? It causes an immediate kernel panic for me when I run
it in a VM:

In exit_mmap() -> free->pgtables() -> unlink_anon_vmas()

I haven't really dug into why but yeah, this is broken.


> ---
>  mm/rmap.c | 63 ++++++++++++++++++++++++++++---------------------------
>  1 file changed, 32 insertions(+), 31 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 34333ae3bd80..2a28edfa5734 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -86,15 +86,21 @@
>  static struct kmem_cache *anon_vma_cachep;
>  static struct kmem_cache *anon_vma_chain_cachep;
>
> -static inline struct anon_vma *anon_vma_alloc(void)
> +static inline struct anon_vma *anon_vma_alloc(struct anon_vma *parent)

I really dislike this, we only allocate with a parent in the fork case, this is
just not a win imo + adds confusion.

>  {
>  	struct anon_vma *anon_vma;
>
>  	anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
> -	if (anon_vma) {
> -		atomic_set(&anon_vma->refcount, 1);
> -		anon_vma->num_children = 0;
> -		anon_vma->num_active_vmas = 0;
> +	if (!anon_vma)
> +		return NULL;
> +
> +	atomic_set(&anon_vma->refcount, 1);
> +	anon_vma->num_children = 0;
> +	anon_vma->num_active_vmas = 0;
> +	if (parent) {
> +		anon_vma->parent = parent;
> +		anon_vma->root = parent->root;

You are accessing parent fields without a lock. This is not good.

> +	} else {
>  		anon_vma->parent = anon_vma;
>  		/*
>  		 * Initialise the anon_vma root to point to itself. If called
> @@ -102,6 +108,7 @@ static inline struct anon_vma *anon_vma_alloc(void)
>  		 */
>  		anon_vma->root = anon_vma;
>  	}
> +	anon_vma->parent->num_children++;

This is even even even more not good, because you're accessing the parent
without a lock, which is just completely broken.

I note below where you're doing this.

>
>  	return anon_vma;
>  }
> @@ -146,6 +153,19 @@ static void anon_vma_chain_free(struct anon_vma_chain *anon_vma_chain)
>  	kmem_cache_free(anon_vma_chain_cachep, anon_vma_chain);
>  }
>
> +static inline void anon_vma_attach(struct vm_area_struct *vma,
> +				   struct anon_vma *anon_vma)
> +{
> +	vma->anon_vma = anon_vma;
> +	vma->anon_vma->num_active_vmas++;
> +}

Yeah I hate the naming, you should have lock asserts here, I don't love that
we're pairing the vma and anon_vma like this, and again I just really question
the value of this.

> +
> +static inline void anon_vma_detach(struct vm_area_struct *vma)
> +{
> +	vma->anon_vma->num_active_vmas--;
> +	vma->anon_vma = NULL;
> +}
> +
>  static void anon_vma_chain_link(struct vm_area_struct *vma,
>  				struct anon_vma_chain *avc,
>  				struct anon_vma *anon_vma)
> @@ -198,10 +218,9 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
>  	anon_vma = find_mergeable_anon_vma(vma);
>  	allocated = NULL;
>  	if (!anon_vma) {
> -		anon_vma = anon_vma_alloc();
> +		anon_vma = anon_vma_alloc(NULL);

This 'arbitrary parameter which is NULL' is also pretty horrible.

>  		if (unlikely(!anon_vma))
>  			goto out_enomem_free_avc;
> -		anon_vma->num_children++; /* self-parent link for new root */
>  		allocated = anon_vma;
>  	}
>
> @@ -209,9 +228,8 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
>  	/* page_table_lock to protect against threads */
>  	spin_lock(&mm->page_table_lock);
>  	if (likely(!vma->anon_vma)) {
> -		vma->anon_vma = anon_vma;
> +		anon_vma_attach(vma, anon_vma);
>  		anon_vma_chain_link(vma, avc, anon_vma);
> -		anon_vma->num_active_vmas++;
>  		allocated = NULL;
>  		avc = NULL;
>  	}
> @@ -306,10 +324,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>  		if (!dst->anon_vma && src->anon_vma &&
>  		    anon_vma->num_children < 2 &&
>  		    anon_vma->num_active_vmas == 0)
> -			dst->anon_vma = anon_vma;
> +			anon_vma_attach(dst, anon_vma);
>  	}
> -	if (dst->anon_vma)
> -		dst->anon_vma->num_active_vmas++;

You're now losing the cases where we didn't reuse an anon_vma but dst->anon_vma
!= NULL? This is just broken isn't it?

>  	unlock_anon_vma_root(root);
>  	return 0;
>
> @@ -356,31 +372,22 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>  		return 0;
>
>  	/* Then add our own anon_vma. */
> -	anon_vma = anon_vma_alloc();
> +	anon_vma = anon_vma_alloc(pvma->anon_vma);
>  	if (!anon_vma)
>  		goto out_error;
> -	anon_vma->num_active_vmas++;
>  	avc = anon_vma_chain_alloc(GFP_KERNEL);
>  	if (!avc)
>  		goto out_error_free_anon_vma;
>
> -	/*
> -	 * The root anon_vma's rwsem is the lock actually used when we
> -	 * lock any of the anon_vmas in this anon_vma tree.
> -	 */

Please please PLEASE do not delete comments like this.

> -	anon_vma->root = pvma->anon_vma->root;
> -	anon_vma->parent = pvma->anon_vma;

Yeah this is just not worth it in my opinion. You're putting code specific to
forking in anon_vma_alloc(), which means you've made the code harder to
understand.

>  	/*
>  	 * With refcounts, an anon_vma can stay around longer than the
>  	 * process it belongs to. The root anon_vma needs to be pinned until
>  	 * this anon_vma is freed, because the lock lives in the root.
>  	 */
>  	get_anon_vma(anon_vma->root);
> -	/* Mark this anon_vma as the one where our new (COWed) pages go. */

Again, please do not remove comments like this.

> -	vma->anon_vma = anon_vma;
> +	anon_vma_attach(vma, anon_vma);
>  	anon_vma_lock_write(anon_vma);
>  	anon_vma_chain_link(vma, avc, anon_vma);
> -	anon_vma->parent->num_children++;

So now we're updating things not under the lock?... this is extremely broken.

>  	anon_vma_unlock_write(anon_vma);
>
>  	return 0;
> @@ -419,15 +426,9 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
>  		list_del(&avc->same_vma);
>  		anon_vma_chain_free(avc);
>  	}
> -	if (vma->anon_vma) {
> -		vma->anon_vma->num_active_vmas--;
> +	if (vma->anon_vma)
> +		anon_vma_detach(vma);
>
> -		/*
> -		 * vma would still be needed after unlink, and anon_vma will be prepared
> -		 * when handle fault.
> -		 */

You are removing key documentation of behaviour, please do not do that.

> -		vma->anon_vma = NULL;
> -	}
>  	unlock_anon_vma_root(root);
>
>  	/*
> --
> 2.25.1
>

Overall this patch is really quite broken, but I don't think the general concept
_as implemented here_ really gives much value.

I _like_ the idea of pairing adjustment of these kinds of anon_vma fields like
num_children, num_active_vmas etc. with operations, but I think there's probably
too many fiddly cases + a need for hellish lock stuff for there to be really
anything too viable here.

Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ