[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <28976a8e-678e-4cfa-8748-e566c9c29053@oracle.com>
Date: Mon, 15 Apr 2024 16:32:28 -0700
From: Jane Chu <jane.chu@...cle.com>
To: Miaohe Lin <linmiaohe@...wei.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Cc: akpm@...ux-foundation.org, brauner@...nel.org, oleg@...hat.com,
tandersen@...flix.com, mjguzik@...il.com, willy@...radead.org,
kent.overstreet@...ux.dev, zhangpeng.00@...edance.com,
hca@...ux.ibm.com, mike.kravetz@...cle.com, muchun.song@...ux.dev,
thorvald@...gle.com, Liam.Howlett@...cle.com
Subject: Re: [PATCH] fork: defer linking file vma until vma is fully
initialized
On 4/10/2024 2:14 AM, Miaohe Lin wrote:
> Thorvald reported a WARNING [1]. And the root cause is below race:
>
> CPU 1 CPU 2
> fork hugetlbfs_fallocate
> dup_mmap hugetlbfs_punch_hole
> i_mmap_lock_write(mapping);
> vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
> i_mmap_unlock_write(mapping);
> hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
> i_mmap_lock_write(mapping);
> hugetlb_vmdelete_list
> vma_interval_tree_foreach
> hugetlb_vma_trylock_write -- Vma_lock is cleared.
> tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
> hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
> i_mmap_unlock_write(mapping);
>
> hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
> i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
> by deferring linking file vma until vma is fully initialized. Those vmas
> should be initialized first before they can be used.
>
> Reported-by: Thorvald Natvig <thorvald@...gle.com>
> Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
> Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
> Signed-off-by: Miaohe Lin <linmiaohe@...wei.com>
> ---
> kernel/fork.c | 33 +++++++++++++++++----------------
> 1 file changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 84de5faa8c9a..99076dbe27d8 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> } else if (anon_vma_fork(tmp, mpnt))
> goto fail_nomem_anon_vma_fork;
> vm_flags_clear(tmp, VM_LOCKED_MASK);
> + /*
> + * Copy/update hugetlb private vma information.
> + */
> + if (is_vm_hugetlb_page(tmp))
> + hugetlb_dup_vma_private(tmp);
> +
> + /*
> + * Link the vma into the MT. After using __mt_dup(), memory
> + * allocation is not necessary here, so it cannot fail.
> + */
> + vma_iter_bulk_store(&vmi, tmp);
> +
> + mm->map_count++;
> +
> + if (tmp->vm_ops && tmp->vm_ops->open)
> + tmp->vm_ops->open(tmp);
> +
> file = tmp->vm_file;
> if (file) {
> struct address_space *mapping = file->f_mapping;
> @@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> i_mmap_unlock_write(mapping);
> }
>
> - /*
> - * Copy/update hugetlb private vma information.
> - */
> - if (is_vm_hugetlb_page(tmp))
> - hugetlb_dup_vma_private(tmp);
> -
> - /*
> - * Link the vma into the MT. After using __mt_dup(), memory
> - * allocation is not necessary here, so it cannot fail.
> - */
> - vma_iter_bulk_store(&vmi, tmp);
> -
> - mm->map_count++;
> if (!(tmp->vm_flags & VM_WIPEONFORK))
> retval = copy_page_range(tmp, mpnt);
>
> - if (tmp->vm_ops && tmp->vm_ops->open)
> - tmp->vm_ops->open(tmp);
> -
> if (retval) {
> mpnt = vma_next(&vmi);
> goto loop_out;
Looks good.
Reviewed-by: Jane Chu <jane.chu@...cle.com>
-jane
Powered by blists - more mailing lists