linux-kernel - Re: [PATCH v5 1/2] mm, hugepages: add mremap() support for hugepage backed vma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b889652d-7050-a721-367c-dcf3ff084daa@oracle.com>
Date:   Mon, 11 Oct 2021 17:18:30 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Mina Almasry <almasrymina@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Ken Chen <kenchen@...gle.com>,
        Chris Kennelly <ckennelly@...gle.com>,
        Michal Hocko <mhocko@...e.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Kirill Shutemov <kirill@...temov.name>
Subject: Re: [PATCH v5 1/2] mm, hugepages: add mremap() support for hugepage
 backed vma

On 10/8/21 11:32 AM, Mina Almasry wrote:
> Support mremap() for hugepage backed vma segment by simply repositioning
> page table entries. The page table entries are repositioned to the new
> virtual address on mremap().
> 
> Hugetlb mremap() support is of course generic; my motivating use case
> is a library (hugepage_text), which reloads the ELF text of executables
> in hugepages. This significantly increases the execution performance of
> said executables.
> 
> Restricts the mremap operation on hugepages to up to the size of the
> original mapping as the underlying hugetlb reservation is not yet
> capable of handling remapping to a larger size.
> 
> During the mremap() operation we detect pmd_share'd mappings and we
> unshare those during the mremap(). On access and fault the sharing is
> established again.
> 
> Signed-off-by: Mina Almasry <almasrymina@...gle.com>
> 
...
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6d2f4c25dd9fb..8200b4c8d09d8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1015,6 +1015,35 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
>  		vma->vm_private_data = (void *)0;
>  }
>  
> +/*
> + * Reset and decrement one ref on hugepage private reservation.
> + * Called with mm->mmap_sem writer semaphore held.
> + * This function should be only used by move_vma() and operate on
> + * same sized vma. It should never come here with last ref on the
> + * reservation.
> + */
> +void clear_vma_resv_huge_pages(struct vm_area_struct *vma)
> +{
> +	/*
> +	 * Clear the old hugetlb private page reservation.
> +	 * It has already been transferred to new_vma.
> +	 *
> +	 * During a mremap() operation of a hugetlb vma we call move_vma()
> +	 * which copies *vma* into *new_vma* and unmaps *vma*. After the copy
> +	 * operation both *new_vma* and *vma* share a reference to the resv_map
> +	 * struct, and at that point *vma* is about to be unmapped. We don't
> +	 * want to return the reservation to the pool at unmap of *vma* because
> +	 * the reservation still lives on in new_vma, so simply decrement the
> +	 * ref here and remove the resv_map reference from this vma.
> +	 */

Are the *...* for special formatting of the words somewhere?  Or, just
for simple added emphasis?  This convention is not used anywhere else in
the file.  Unless there is a good reason for doing so, I would prefer to
to drop the *...* convention here.

> +	struct resv_map *reservations = vma_resv_map(vma);
> +
> +	if (reservations && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
> +		kref_put(&reservations->refs, resv_map_release);
> +
> +	reset_vma_resv_huge_pages(vma);
> +}
...
> diff --git a/mm/mremap.c b/mm/mremap.c
> index c0b6c41b7b78f..6a3f7d38b7539 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -489,6 +489,10 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  	old_end = old_addr + len;
>  	flush_cache_range(vma, old_addr, old_end);
>  
> +	if (is_vm_hugetlb_page(vma))
> +		return move_hugetlb_page_tables(vma, new_vma, old_addr,
> +						new_addr, len);
> +
>  	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
>  				old_addr, old_end);
>  	mmu_notifier_invalidate_range_start(&range);
> @@ -646,6 +650,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
>  		mremap_userfaultfd_prep(new_vma, uf);
>  	}
>  
> +	if (is_vm_hugetlb_page(vma)) {
> +		clear_vma_resv_huge_pages(vma);
> +	}
> +
>  	/* Conceal VM_ACCOUNT so old reservation is not undone */
>  	if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) {
>  		vma->vm_flags &= ~VM_ACCOUNT;
> @@ -739,9 +747,6 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
>  			(vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
>  		return ERR_PTR(-EINVAL);
>  
> -	if (is_vm_hugetlb_page(vma))
> -		return ERR_PTR(-EINVAL);
> -
>  	/* We can't remap across vm area boundaries */
>  	if (old_len > vma->vm_end - addr)
>  		return ERR_PTR(-EFAULT);
> @@ -937,6 +942,27 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>  
>  	if (mmap_write_lock_killable(current->mm))
>  		return -EINTR;
> +	vma = find_vma(mm, addr);
> +	if (!vma || vma->vm_start > addr) {
> +		ret = EFAULT;
> +		goto out;
> +	}
> +
> +	if (is_vm_hugetlb_page(vma)) {
> +		struct hstate *h __maybe_unused = hstate_vma(vma);
> +
> +		old_len = ALIGN(old_len, huge_page_size(h));
> +		new_len = ALIGN(new_len, huge_page_size(h));
> +		addr = ALIGN(addr, huge_page_size(h));
> +		new_addr = ALIGN(new_addr, huge_page_size(h));

Instead of aligning addr and new_addr, we should be checking for huge
page alignment and returning error.  This makes it consistent with the
requirement that they be PAGE aligned in the non-hugetlb case.  Sorry if
that was unclear in previous comments.

		/* addrs must be huge page aligned */
		if (addr & ~huge_page_mask(h))
			goto out;
		if (new_addr & ~huge_page_mask(h))
			goto out;
			
> +
> +		/*
> +		 * Don't allow remap expansion, because the underlying hugetlb
> +		 * reservation is not yet capable to handle split reservation.
> +		 */
> +		if (new_len > old_len)
> +			goto out;
> +	}
>  
>  	if (flags & (MREMAP_FIXED | MREMAP_DONTUNMAP)) {
>  		ret = mremap_to(addr, old_len, new_addr, new_len,
> 

-- 
Mike Kravetz