[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c0101ac-afac-4601-8be9-24587877a5e3@lucifer.local>
Date: Fri, 5 Sep 2025 15:58:42 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Yajun Deng <yajun.deng@...ux.dev>
Cc: akpm@...ux-foundation.org, david@...hat.com, riel@...riel.com,
Liam.Howlett@...cle.com, vbabka@...e.cz, harry.yoo@...cle.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/rmap: make num_children and num_active_vmas update in
internally
On Fri, Sep 05, 2025 at 01:20:19PM +0000, Yajun Deng wrote:
> If the anon_vma_alloc() is called, the num_children of the parent of
> the anon_vma will be updated. But this operation occurs outside of
> anon_vma_alloc().
I don't like that what is supposed to be an allocation function is now also
doing additional work.
And there's literally only one place where this matters, since
__anon_vma_prepare() doesn't have a parent, it's ltierally only anon_vma_fork().
And there are only 2 callers, so I don't see the purpose in doing this?
>
> The num_active_vmas are also updated outside of anon_vma.
I don't know what 'outside of anon_vma' means?
>
> Pass the parent of anon_vma to the anon_vma_alloc() and update the
> num_children inside it.
>
> Introduce anon_vma_attach() and anon_vma_detach() to update
> num_active_vmas with the anon_vma.
I really dislike this naming - VMA detachment means something entirely
different, you're not really 'attaching' or 'detaching' anything you're just
changing a single stat of the ones you'd need to change were you truly doing so
etc. etc.
It's misleading.
>
> Signed-off-by: Yajun Deng <yajun.deng@...ux.dev>
Did you test this at all? It causes an immediate kernel panic for me when I run
it in a VM:
In exit_mmap() -> free->pgtables() -> unlink_anon_vmas()
I haven't really dug into why but yeah, this is broken.
> ---
> mm/rmap.c | 63 ++++++++++++++++++++++++++++---------------------------
> 1 file changed, 32 insertions(+), 31 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 34333ae3bd80..2a28edfa5734 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -86,15 +86,21 @@
> static struct kmem_cache *anon_vma_cachep;
> static struct kmem_cache *anon_vma_chain_cachep;
>
> -static inline struct anon_vma *anon_vma_alloc(void)
> +static inline struct anon_vma *anon_vma_alloc(struct anon_vma *parent)
I really dislike this, we only allocate with a parent in the fork case, this is
just not a win imo + adds confusion.
> {
> struct anon_vma *anon_vma;
>
> anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
> - if (anon_vma) {
> - atomic_set(&anon_vma->refcount, 1);
> - anon_vma->num_children = 0;
> - anon_vma->num_active_vmas = 0;
> + if (!anon_vma)
> + return NULL;
> +
> + atomic_set(&anon_vma->refcount, 1);
> + anon_vma->num_children = 0;
> + anon_vma->num_active_vmas = 0;
> + if (parent) {
> + anon_vma->parent = parent;
> + anon_vma->root = parent->root;
You are accessing parent fields without a lock. This is not good.
> + } else {
> anon_vma->parent = anon_vma;
> /*
> * Initialise the anon_vma root to point to itself. If called
> @@ -102,6 +108,7 @@ static inline struct anon_vma *anon_vma_alloc(void)
> */
> anon_vma->root = anon_vma;
> }
> + anon_vma->parent->num_children++;
This is even even even more not good, because you're accessing the parent
without a lock, which is just completely broken.
I note below where you're doing this.
>
> return anon_vma;
> }
> @@ -146,6 +153,19 @@ static void anon_vma_chain_free(struct anon_vma_chain *anon_vma_chain)
> kmem_cache_free(anon_vma_chain_cachep, anon_vma_chain);
> }
>
> +static inline void anon_vma_attach(struct vm_area_struct *vma,
> + struct anon_vma *anon_vma)
> +{
> + vma->anon_vma = anon_vma;
> + vma->anon_vma->num_active_vmas++;
> +}
Yeah I hate the naming, you should have lock asserts here, I don't love that
we're pairing the vma and anon_vma like this, and again I just really question
the value of this.
> +
> +static inline void anon_vma_detach(struct vm_area_struct *vma)
> +{
> + vma->anon_vma->num_active_vmas--;
> + vma->anon_vma = NULL;
> +}
> +
> static void anon_vma_chain_link(struct vm_area_struct *vma,
> struct anon_vma_chain *avc,
> struct anon_vma *anon_vma)
> @@ -198,10 +218,9 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> anon_vma = find_mergeable_anon_vma(vma);
> allocated = NULL;
> if (!anon_vma) {
> - anon_vma = anon_vma_alloc();
> + anon_vma = anon_vma_alloc(NULL);
This 'arbitrary parameter which is NULL' is also pretty horrible.
> if (unlikely(!anon_vma))
> goto out_enomem_free_avc;
> - anon_vma->num_children++; /* self-parent link for new root */
> allocated = anon_vma;
> }
>
> @@ -209,9 +228,8 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> /* page_table_lock to protect against threads */
> spin_lock(&mm->page_table_lock);
> if (likely(!vma->anon_vma)) {
> - vma->anon_vma = anon_vma;
> + anon_vma_attach(vma, anon_vma);
> anon_vma_chain_link(vma, avc, anon_vma);
> - anon_vma->num_active_vmas++;
> allocated = NULL;
> avc = NULL;
> }
> @@ -306,10 +324,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> if (!dst->anon_vma && src->anon_vma &&
> anon_vma->num_children < 2 &&
> anon_vma->num_active_vmas == 0)
> - dst->anon_vma = anon_vma;
> + anon_vma_attach(dst, anon_vma);
> }
> - if (dst->anon_vma)
> - dst->anon_vma->num_active_vmas++;
You're now losing the cases where we didn't reuse an anon_vma but dst->anon_vma
!= NULL? This is just broken isn't it?
> unlock_anon_vma_root(root);
> return 0;
>
> @@ -356,31 +372,22 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> return 0;
>
> /* Then add our own anon_vma. */
> - anon_vma = anon_vma_alloc();
> + anon_vma = anon_vma_alloc(pvma->anon_vma);
> if (!anon_vma)
> goto out_error;
> - anon_vma->num_active_vmas++;
> avc = anon_vma_chain_alloc(GFP_KERNEL);
> if (!avc)
> goto out_error_free_anon_vma;
>
> - /*
> - * The root anon_vma's rwsem is the lock actually used when we
> - * lock any of the anon_vmas in this anon_vma tree.
> - */
Please please PLEASE do not delete comments like this.
> - anon_vma->root = pvma->anon_vma->root;
> - anon_vma->parent = pvma->anon_vma;
Yeah this is just not worth it in my opinion. You're putting code specific to
forking in anon_vma_alloc(), which means you've made the code harder to
understand.
> /*
> * With refcounts, an anon_vma can stay around longer than the
> * process it belongs to. The root anon_vma needs to be pinned until
> * this anon_vma is freed, because the lock lives in the root.
> */
> get_anon_vma(anon_vma->root);
> - /* Mark this anon_vma as the one where our new (COWed) pages go. */
Again, please do not remove comments like this.
> - vma->anon_vma = anon_vma;
> + anon_vma_attach(vma, anon_vma);
> anon_vma_lock_write(anon_vma);
> anon_vma_chain_link(vma, avc, anon_vma);
> - anon_vma->parent->num_children++;
So now we're updating things not under the lock?... this is extremely broken.
> anon_vma_unlock_write(anon_vma);
>
> return 0;
> @@ -419,15 +426,9 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
> list_del(&avc->same_vma);
> anon_vma_chain_free(avc);
> }
> - if (vma->anon_vma) {
> - vma->anon_vma->num_active_vmas--;
> + if (vma->anon_vma)
> + anon_vma_detach(vma);
>
> - /*
> - * vma would still be needed after unlink, and anon_vma will be prepared
> - * when handle fault.
> - */
You are removing key documentation of behaviour, please do not do that.
> - vma->anon_vma = NULL;
> - }
> unlock_anon_vma_root(root);
>
> /*
> --
> 2.25.1
>
Overall this patch is really quite broken, but I don't think the general concept
_as implemented here_ really gives much value.
I _like_ the idea of pairing adjustment of these kinds of anon_vma fields like
num_children, num_active_vmas etc. with operations, but I think there's probably
too many fiddly cases + a need for hellish lock stuff for there to be really
anything too viable here.
Cheers, Lorenzo
Powered by blists - more mailing lists