lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 4 Feb 2021 15:25:37 -0800
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Peter Xu <peterx@...hat.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc:     Kirill Shutemov <kirill@...temov.name>,
        Wei Zhang <wzam@...zon.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Matthew Wilcox <willy@...radead.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Gal Pressman <galpress@...zon.com>, Jan Kara <jack@...e.cz>,
        Jann Horn <jannh@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jason Gunthorpe <jgg@...pe.ca>,
        David Gibson <david@...son.dropbear.id.au>,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v2 4/4] hugetlb: Do early cow when page pinned on src mm

On 2/4/21 6:50 AM, Peter Xu wrote:
> This is the last missing piece of the COW-during-fork effort when there're
> pinned pages found.  One can reference 70e806e4e645 ("mm: Do early cow for
> pinned pages during fork() for ptes", 2020-09-27) for more information, since
> we do similar things here rather than pte this time, but just for hugetlb.
> 
> Signed-off-by: Peter Xu <peterx@...hat.com>
> ---
>  mm/hugetlb.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 56 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 9e6ea96bf33b..5793936e00ef 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3734,11 +3734,27 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
>  		return false;
>  }
>  
> +static void
> +hugetlb_copy_page(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
> +		  struct page *old_page, struct page *new_page)
> +{
> +	struct hstate *h = hstate_vma(vma);
> +	unsigned int psize = pages_per_huge_page(h);
> +
> +	copy_user_huge_page(new_page, old_page, addr, vma, psize);

copy_user_huge_page calls cond_resched() and has might_sleep().  Imagine
the time it takes to copy 1G.  Usually called without holding locks, but
this new code is calling it with ptl locks held.  The copy should be done
outside the ptl, but you will need the ptl to update the pte/rmap.  So,
doing all this within one neat helper like this may not be possible.

> +	__SetPageUptodate(new_page);
> +	ClearPagePrivate(new_page);
> +	set_page_huge_active(new_page);

Code to replace the above ClearPagePrivate and set_page_huge_active is
in Andrew's tree.  With changes in Andrew's tree, this would be:

	ClearHPageRestoreReserve(new_page);
	SetHPageMigratable(new_page);

Ideally, the SetHPageMigratable would be done after the set_pte and add_rmap
so the page does not get migrated before these operations.  However, this
can not happen since we are holding the ptl.  So, no problem here.  If code
is restructured to call copy_user_huge_page outside ptl, keep this in mind.

Also, technically ClearHPageRestoreReserve is not needed as it would not be
set by alloc_huge_page because we did not consume a reserve.  However, better
to leave in place in case someone wants to use helper for something else.

> +	set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, new_page, 1));
> +	hugepage_add_new_anon_rmap(new_page, vma, addr);
> +	hugetlb_count_add(psize, vma->vm_mm);
> +}
> +
>  int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>  			    struct vm_area_struct *vma)
>  {
>  	pte_t *src_pte, *dst_pte, entry, dst_entry;
> -	struct page *ptepage;
> +	struct page *ptepage, *prealloc = NULL;
>  	unsigned long addr;
>  	int cow;
>  	struct hstate *h = hstate_vma(vma);
> @@ -3787,7 +3803,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>  		dst_entry = huge_ptep_get(dst_pte);
>  		if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
>  			continue;
> -
> +again:
>  		dst_ptl = huge_pte_lock(h, dst, dst_pte);
>  		src_ptl = huge_pte_lockptr(h, src, src_pte);
>  		spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
> @@ -3816,6 +3832,39 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>  			}
>  			set_huge_swap_pte_at(dst, addr, dst_pte, entry, sz);
>  		} else {
> +			entry = huge_ptep_get(src_pte);
> +			ptepage = pte_page(entry);
> +			get_page(ptepage);
> +
> +			/*
> +			 * This is a rare case where we see pinned hugetlb
> +			 * pages while they're prone to COW.  We need to do the
> +			 * COW earlier during fork.
> +			 *
> +			 * When pre-allocating the page we need to be without
> +			 * all the locks since we could sleep when allocate.
> +			 */
> +			if (unlikely(page_needs_cow_for_dma(vma, ptepage))) {
> +				if (!prealloc) {
> +					put_page(ptepage);
> +					spin_unlock(src_ptl);
> +					spin_unlock(dst_ptl);
> +					prealloc = alloc_huge_page(vma, addr, 1);
> +					if (!prealloc) {

alloc_huge_page will return error codes, so you need to check IS_ERR(prealloc)
not just NULL.
-- 
Mike Kravetz

> +						ret = -ENOMEM;
> +						break;
> +					}
> +					goto again;
> +				}
> +				hugetlb_copy_page(vma, dst_pte, addr, ptepage,
> +						  prealloc);
> +				put_page(ptepage);
> +				spin_unlock(src_ptl);
> +				spin_unlock(dst_ptl);
> +				prealloc = NULL;
> +				continue;
> +			}
> +
>  			if (cow) {
>  				/*
>  				 * No need to notify as we are downgrading page
> @@ -3826,9 +3875,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>  				 */
>  				huge_ptep_set_wrprotect(src, addr, src_pte);
>  			}
> -			entry = huge_ptep_get(src_pte);
> -			ptepage = pte_page(entry);
> -			get_page(ptepage);
> +
>  			page_dup_rmap(ptepage, true);
>  			set_huge_pte_at(dst, addr, dst_pte, entry);
>  			hugetlb_count_add(pages_per_huge_page(h), dst);
> @@ -3842,6 +3889,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>  	else
>  		i_mmap_unlock_read(mapping);
>  
> +	/* Free the preallocated page if not used at last */
> +	if (prealloc)
> +		put_page(prealloc);
> +
>  	return ret;
>  }
>  
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ