linux-kernel - Re: [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y3UY+XXXwnjDLMPl@x1n>
Date:   Wed, 16 Nov 2022 12:08:09 -0500
From:   Peter Xu <peterx@...hat.com>
To:     James Houghton <jthoughton@...gle.com>
Cc:     Mike Kravetz <mike.kravetz@...cle.com>,
        Muchun Song <songmuchun@...edance.com>,
        David Hildenbrand <david@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>,
        Zach O'Keefe <zokeefe@...gle.com>,
        Manish Mishra <manish.mishra@...anix.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Yang Shi <shy828301@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock_alloc return
 its failure reason

On Fri, Oct 21, 2022 at 04:36:21PM +0000, James Houghton wrote:
> Currently hugetlb_vma_lock_alloc doesn't return anything, as there is no
> need: if it fails, PMD sharing won't be enabled. However, HGM requires
> that the VMA lock exists, so we need to verify that
> hugetlb_vma_lock_alloc actually succeeded. If hugetlb_vma_lock_alloc
> fails, then we can pass that up to the caller that is attempting to
> enable HGM.
> 
> Signed-off-by: James Houghton <jthoughton@...gle.com>
> ---
>  mm/hugetlb.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 52cec5b0789e..dc82256b89dd 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -92,7 +92,7 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
>  /* Forward declaration */
>  static int hugetlb_acct_memory(struct hstate *h, long delta);
>  static void hugetlb_vma_lock_free(struct vm_area_struct *vma);
> -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma);
> +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma);
>  static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma);
>  
>  static inline bool subpool_is_free(struct hugepage_subpool *spool)
> @@ -7001,17 +7001,17 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma)
>  	}
>  }
>  
> -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
> +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
>  {
>  	struct hugetlb_vma_lock *vma_lock;
>  
>  	/* Only establish in (flags) sharable vmas */
>  	if (!vma || !(vma->vm_flags & VM_MAYSHARE))
> -		return;
> +		return -EINVAL;
>  
> -	/* Should never get here with non-NULL vm_private_data */
> +	/* We've already allocated the lock. */
>  	if (vma->vm_private_data)
> -		return;
> +		return 0;

No objection on the patch itself, but I am just wondering what guarantees
thread-safety for this function to not leak vm_private_data when two
threads try to allocate at the same time.

I think it should be the write mmap lock.  I saw that in your latest code
base there's:

	/*
	 * We must hold the mmap lock for writing so that callers can rely on
	 * hugetlb_hgm_enabled returning a consistent result while holding
	 * the mmap lock for reading.
	 */
	mmap_assert_write_locked(vma->vm_mm);

	/* HugeTLB HGM requires the VMA lock to synchronize collapsing. */
	ret = hugetlb_vma_data_alloc(vma);
	if (ret)
		return ret;

So that's covered there.  The rest places are hugetlb_vm_op_open() and
hugetlb_reserve_pages() and they all seem fine too: hugetlb_vm_op_open() is
during mmap(), the latter has vma==NULL so allocation will be skipped.

I'm wondering whether it would make sense to move this assert to be inside
of hugetlb_vma_data_alloc() after the !vma check, or just add the same
assert too but for different reason.

>  
>  	vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL);
>  	if (!vma_lock) {
> @@ -7026,13 +7026,14 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
>  		 * allocation failure.
>  		 */
>  		pr_warn_once("HugeTLB: unable to allocate vma specific lock\n");
> -		return;
> +		return -ENOMEM;
>  	}
>  
>  	kref_init(&vma_lock->refs);
>  	init_rwsem(&vma_lock->rw_sema);
>  	vma_lock->vma = vma;
>  	vma->vm_private_data = vma_lock;
> +	return 0;
>  }
>  
>  /*
> @@ -7160,8 +7161,9 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma)
>  {
>  }
>  
> -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
> +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
>  {
> +	return 0;
>  }
>  
>  pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
> -- 
> 2.38.0.135.g90850a2211-goog
> 
> 

-- 
Peter Xu