linux-kernel - Re: [PATCH 4/8] mm: FOLL_DUMP replace FOLL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 9 Sep 2009 12:14:38 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Hugh Dickins <hugh.dickins@...cali.co.uk>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Jeff Chua <jeff.chua.linux@...il.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Nick Piggin <npiggin@...e.de>, Rik van Riel <riel@...hat.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 4/8] mm: FOLL_DUMP replace FOLL_ANON

On Mon, Sep 07, 2009 at 10:35:32PM +0100, Hugh Dickins wrote:
> The "FOLL_ANON optimization" and its use_zero_page() test have caused
> confusion and bugs: why does it test VM_SHARED? for the very good but
> unsatisfying reason that VMware crashed without.  As we look to maybe
> reinstating anonymous use of the ZERO_PAGE, we need to sort this out.
> 
> Easily done: it's silly for __get_user_pages() and follow_page() to
> be guessing whether it's safe to assume that they're being used for
> a coredump (which can take a shortcut snapshot where other uses must
> handle a fault) - just tell them with GUP_FLAGS_DUMP and FOLL_DUMP.
> 
> get_dump_page() doesn't even want a ZERO_PAGE: an error suits fine.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@...cali.co.uk>

Acked-by: Mel Gorman <mel@....ul.ie>

> ---
> 
>  include/linux/mm.h |    2 +-
>  mm/internal.h      |    1 +
>  mm/memory.c        |   43 ++++++++++++-------------------------------
>  3 files changed, 14 insertions(+), 32 deletions(-)
> 
> --- mm3/include/linux/mm.h	2009-09-07 13:16:32.000000000 +0100
> +++ mm4/include/linux/mm.h	2009-09-07 13:16:39.000000000 +0100
> @@ -1247,7 +1247,7 @@ struct page *follow_page(struct vm_area_
>  #define FOLL_WRITE	0x01	/* check pte is writable */
>  #define FOLL_TOUCH	0x02	/* mark page accessed */
>  #define FOLL_GET	0x04	/* do get_page on page */
> -#define FOLL_ANON	0x08	/* give ZERO_PAGE if no pgtable */
> +#define FOLL_DUMP	0x08	/* give error on hole if it would be zero */
>  
>  typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
>  			void *data);
> --- mm3/mm/internal.h	2009-09-07 13:16:22.000000000 +0100
> +++ mm4/mm/internal.h	2009-09-07 13:16:39.000000000 +0100
> @@ -252,6 +252,7 @@ static inline void mminit_validate_memmo
>  
>  #define GUP_FLAGS_WRITE		0x01
>  #define GUP_FLAGS_FORCE		0x02
> +#define GUP_FLAGS_DUMP		0x04
>  
>  int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>  		     unsigned long start, int len, int flags,
> --- mm3/mm/memory.c	2009-09-07 13:16:32.000000000 +0100
> +++ mm4/mm/memory.c	2009-09-07 13:16:39.000000000 +0100
> @@ -1174,41 +1174,22 @@ no_page:
>  	pte_unmap_unlock(ptep, ptl);
>  	if (!pte_none(pte))
>  		return page;
> -	/* Fall through to ZERO_PAGE handling */
> +
>  no_page_table:
>  	/*
>  	 * When core dumping an enormous anonymous area that nobody
> -	 * has touched so far, we don't want to allocate page tables.
> +	 * has touched so far, we don't want to allocate unnecessary pages or
> +	 * page tables.  Return error instead of NULL to skip handle_mm_fault,
> +	 * then get_dump_page() will return NULL to leave a hole in the dump.
> +	 * But we can only make this optimization where a hole would surely
> +	 * be zero-filled if handle_mm_fault() actually did handle it.
>  	 */
> -	if (flags & FOLL_ANON) {
> -		page = ZERO_PAGE(0);
> -		if (flags & FOLL_GET)
> -			get_page(page);
> -		BUG_ON(flags & FOLL_WRITE);
> -	}
> +	if ((flags & FOLL_DUMP) &&
> +	    (!vma->vm_ops || !vma->vm_ops->fault))
> +		return ERR_PTR(-EFAULT);
>  	return page;
>  }
>  
> -/* Can we do the FOLL_ANON optimization? */
> -static inline int use_zero_page(struct vm_area_struct *vma)
> -{
> -	/*
> -	 * We don't want to optimize FOLL_ANON for make_pages_present()
> -	 * when it tries to page in a VM_LOCKED region. As to VM_SHARED,
> -	 * we want to get the page from the page tables to make sure
> -	 * that we serialize and update with any other user of that
> -	 * mapping.
> -	 */
> -	if (vma->vm_flags & (VM_LOCKED | VM_SHARED))
> -		return 0;
> -	/*
> -	 * And if we have a fault routine, it's not an anonymous region.
> -	 */
> -	return !vma->vm_ops || !vma->vm_ops->fault;
> -}
> -
> -
> -
>  int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>  		     unsigned long start, int nr_pages, int flags,
>  		     struct page **pages, struct vm_area_struct **vmas)
> @@ -1288,8 +1269,8 @@ int __get_user_pages(struct task_struct
>  		foll_flags = FOLL_TOUCH;
>  		if (pages)
>  			foll_flags |= FOLL_GET;
> -		if (!write && use_zero_page(vma))
> -			foll_flags |= FOLL_ANON;
> +		if (flags & GUP_FLAGS_DUMP)
> +			foll_flags |= FOLL_DUMP;
>  
>  		do {
>  			struct page *page;
> @@ -1447,7 +1428,7 @@ struct page *get_dump_page(unsigned long
>  	struct page *page;
>  
>  	if (__get_user_pages(current, current->mm, addr, 1,
> -				GUP_FLAGS_FORCE, &page, &vma) < 1)
> +			GUP_FLAGS_FORCE | GUP_FLAGS_DUMP, &page, &vma) < 1)
>  		return NULL;
>  	if (page == ZERO_PAGE(0)) {
>  		page_cache_release(page);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/