lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160424225057.GA6670@node.shutemov.name>
Date:	Mon, 25 Apr 2016 01:50:57 +0300
From:	"Kirill A. Shutemov" <kirill@...temov.name>
To:	Toshi Kani <toshi.kani@....com>, Hugh Dickins <hughd@...gle.com>
Cc:	akpm@...ux-foundation.org, dan.j.williams@...el.com,
	viro@...iv.linux.org.uk, willy@...ux.intel.com,
	ross.zwisler@...ux.intel.com, kirill.shutemov@...ux.intel.com,
	david@...morbit.com, jack@...e.cz, tytso@....edu,
	adilger.kernel@...ger.ca, mike.kravetz@...cle.com,
	linux-nvdimm@...ts.01.org, linux-fsdevel@...r.kernel.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd
 mappings

On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote:
> When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
> size.  This feature relies on both mmap virtual address and FS
> block (i.e. physical address) to be aligned by the pmd page size.
> Users can use mkfs options to specify FS to align block allocations.
> However, aligning mmap address requires code changes to existing
> applications for providing a pmd-aligned address to mmap().
> 
> For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
> It calls mmap() with a NULL address, which needs to be changed to
> provide a pmd-aligned address for testing with DAX pmd mappings.
> Changing all applications that call mmap() with NULL is undesirable.
> 
> Add thp_get_unmapped_area(), which can be called by filesystem's
> get_unmapped_area to align an mmap address by the pmd size for
> a DAX file.  It calls the default handler, mm->get_unmapped_area(),
> to find a range and then aligns it for a DAX file.
> 
> thp_get_unmapped_area() can be extended for huge page cache support.
> 
> The patch is based on Matthew Wilcox's change that allows adding
> support of the pud page size easily.

See Hugh's implementation:

http://lkml.kernel.org/r/alpine.LSU.2.11.1604051420110.5965@eggly.anvils

> [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
> Signed-off-by: Toshi Kani <toshi.kani@....com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Alexander Viro <viro@...iv.linux.org.uk>
> Cc: Dan Williams <dan.j.williams@...el.com>
> Cc: Matthew Wilcox <willy@...ux.intel.com>
> Cc: Ross Zwisler <ross.zwisler@...ux.intel.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> Cc: Dave Chinner <david@...morbit.com>
> Cc: Jan Kara <jack@...e.cz>
> Cc: Theodore Ts'o <tytso@....edu>
> Cc: Andreas Dilger <adilger.kernel@...ger.ca>
> Cc: Mike Kravetz <mike.kravetz@...cle.com>
> ---
>  include/linux/huge_mm.h |    7 +++++++
>  mm/huge_memory.c        |   43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7008623..3769674 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
>  
>  extern unsigned long transparent_hugepage_flags;
>  
> +extern unsigned long thp_get_unmapped_area(struct file *filp,
> +		unsigned long addr, unsigned long len, unsigned long pgoff,
> +		unsigned long flags);
> +
>  extern void prep_transhuge_page(struct page *page);
>  extern void free_transhuge_page(struct page *page);
>  
> @@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void);
>  #define transparent_hugepage_enabled(__vma) 0
>  
>  #define transparent_hugepage_flags 0UL
> +
> +#define thp_get_unmapped_area	NULL
> +
>  static inline int
>  split_huge_page_to_list(struct page *page, struct list_head *list)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 86f9f8b..2181c7f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page)
>  	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
>  }
>  
> +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
> +		loff_t off, unsigned long flags, unsigned long size)
> +{
> +	unsigned long addr;
> +	loff_t off_end = off + len;
> +	loff_t off_align = round_up(off, size);
> +	unsigned long len_pad;
> +
> +	if (off_end <= off_align || (off_end - off_align) < size)
> +		return 0;
> +
> +	len_pad = len + size;
> +	if (len_pad < len || (off + len_pad) < off)
> +		return 0;
> +
> +	addr = current->mm->get_unmapped_area(filp, 0, len_pad,
> +					      off >> PAGE_SHIFT, flags);
> +	if (IS_ERR_VALUE(addr))
> +		return 0;
> +
> +	addr += (off - addr) & (size - 1);
> +	return addr;

Hugh has more sanity checks before and after call to get_unmapped_area().
Please, consider borrowing them.

> +}
> +
> +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> +		unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> +	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> +
> +	if (addr)
> +		goto out;

I think it's too strong reaction to hint, isn't it?
We definately need this for MAP_FIXED. But in general? Maybe.

> +	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
> +		goto out;
> +
> +	addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
> +	if (addr)
> +		return addr;
> +
> + out:
> +	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> +}
> +EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
> +
>  static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
>  					struct vm_area_struct *vma,
>  					unsigned long address, pmd_t *pmd,
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>

-- 
 Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ