[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <28976a8e-678e-4cfa-8748-e566c9c29053@oracle.com>
Date: Mon, 15 Apr 2024 16:32:28 -0700
From: Jane Chu <jane.chu@...cle.com>
To: Miaohe Lin <linmiaohe@...wei.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc: akpm@...ux-foundation.org, brauner@...nel.org, oleg@...hat.com,
        tandersen@...flix.com, mjguzik@...il.com, willy@...radead.org,
        kent.overstreet@...ux.dev, zhangpeng.00@...edance.com,
        hca@...ux.ibm.com, mike.kravetz@...cle.com, muchun.song@...ux.dev,
        thorvald@...gle.com, Liam.Howlett@...cle.com
Subject: Re: [PATCH] fork: defer linking file vma until vma is fully
 initialized
On 4/10/2024 2:14 AM, Miaohe Lin wrote:
> Thorvald reported a WARNING [1]. And the root cause is below race:
>
>   CPU 1					CPU 2
>   fork					hugetlbfs_fallocate
>    dup_mmap				 hugetlbfs_punch_hole
>     i_mmap_lock_write(mapping);
>     vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
>     i_mmap_unlock_write(mapping);
>     hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
> 					 i_mmap_lock_write(mapping);
>     					 hugetlb_vmdelete_list
> 					  vma_interval_tree_foreach
> 					   hugetlb_vma_trylock_write -- Vma_lock is cleared.
>     tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
> 					   hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
> 					 i_mmap_unlock_write(mapping);
>
> hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
> i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
> by deferring linking file vma until vma is fully initialized. Those vmas
> should be initialized first before they can be used.
>
> Reported-by: Thorvald Natvig <thorvald@...gle.com>
> Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
> Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
> Signed-off-by: Miaohe Lin <linmiaohe@...wei.com>
> ---
>   kernel/fork.c | 33 +++++++++++++++++----------------
>   1 file changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 84de5faa8c9a..99076dbe27d8 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>   		} else if (anon_vma_fork(tmp, mpnt))
>   			goto fail_nomem_anon_vma_fork;
>   		vm_flags_clear(tmp, VM_LOCKED_MASK);
> +		/*
> +		 * Copy/update hugetlb private vma information.
> +		 */
> +		if (is_vm_hugetlb_page(tmp))
> +			hugetlb_dup_vma_private(tmp);
> +
> +		/*
> +		 * Link the vma into the MT. After using __mt_dup(), memory
> +		 * allocation is not necessary here, so it cannot fail.
> +		 */
> +		vma_iter_bulk_store(&vmi, tmp);
> +
> +		mm->map_count++;
> +
> +		if (tmp->vm_ops && tmp->vm_ops->open)
> +			tmp->vm_ops->open(tmp);
> +
>   		file = tmp->vm_file;
>   		if (file) {
>   			struct address_space *mapping = file->f_mapping;
> @@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>   			i_mmap_unlock_write(mapping);
>   		}
>   
> -		/*
> -		 * Copy/update hugetlb private vma information.
> -		 */
> -		if (is_vm_hugetlb_page(tmp))
> -			hugetlb_dup_vma_private(tmp);
> -
> -		/*
> -		 * Link the vma into the MT. After using __mt_dup(), memory
> -		 * allocation is not necessary here, so it cannot fail.
> -		 */
> -		vma_iter_bulk_store(&vmi, tmp);
> -
> -		mm->map_count++;
>   		if (!(tmp->vm_flags & VM_WIPEONFORK))
>   			retval = copy_page_range(tmp, mpnt);
>   
> -		if (tmp->vm_ops && tmp->vm_ops->open)
> -			tmp->vm_ops->open(tmp);
> -
>   		if (retval) {
>   			mpnt = vma_next(&vmi);
>   			goto loop_out;
Looks good.
Reviewed-by: Jane Chu <jane.chu@...cle.com>
-jane
Powered by blists - more mailing lists
 
