linux-kernel - Re: [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e458014c-57a7-46a5-3528-2b33f23a6756@oracle.com>
Date:   Fri, 15 May 2020 11:44:04 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Shijie Hu <hushijie3@...wei.com>
Cc:     will@...nel.org, akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, nixiaoming@...wei.com,
        wangxu72@...wei.com, wangkefeng.wang@...wei.com,
        yangerkun@...wei.com, wangle6@...wei.com, cg.chen@...wei.com,
        chenjie6@...wei.com, alex.huangjianhui@...wei.com
Subject: Re: [PATCH v5] hugetlbfs: Get unmapped area below TASK_UNMAPPED_BASE
 for hugetlbfs

On 5/14/20 7:31 AM, Shijie Hu wrote:
> Here is a final patch to solve that hugetlb_get_unmapped_area() can't
> get unmapped area below mmap base for huge pages based on a few previous
> discussions and patches from me.
> 
> I'm so sorry. When sending v2 and v3 patches, I forget to cc:
> linux-mm@...ck.org and linux-kernel@...r.kernel.org. No records of these
> two patches found on the https://lkml.org/lkml/.
> 
> Patch V1: https://lkml.org/lkml/2020/4/27/440
> Patch V4: https://lkml.org/lkml/2020/5/13/980
> 
> Changes in V2:
> * Follow Mike's suggestions, move hugetlb_get_unmapped_area() routines
> from "fs/hugetlbfs/inode.c" to "arch/arm64/mm/hugetlbpage.c", without
> changing core code.
> * Add mmap_is_legacy() function, and only fall back to the bottom-up
> function on no-legacy mode.
> 
> Changes in V3:
> * Add *bottomup() and *topdown() two function, and check if
> mm->get_unmapped_area is equal to arch_get_unmapped_area() instead of
> checking mmap_is_legacy() to determine which function should be used.
> 
> Changes in V4:
> * Follow the suggestions of Will and Mike, move back this patch to core
> code.
> 
> Changes in V5:
> * Fix kbuild test error.
> 
> ------
> 
> In a 32-bit program, running on arm64 architecture. When the address
> space below mmap base is completely exhausted, shmat() for huge pages
> will return ENOMEM, but shmat() for normal pages can still success on
> no-legacy mode. This seems not fair.
> 
> For normal pages, get_unmapped_area() calling flows are:
>     => mm->get_unmapped_area()
> 	if on legacy mode,
> 	    => arch_get_unmapped_area()
> 			=> vm_unmapped_area()
> 	if on no-legacy mode,
> 	    => arch_get_unmapped_area_topdown()
> 			=> vm_unmapped_area()
> 
> For huge pages, get_unmapped_area() calling flows are:
>     => file->f_op->get_unmapped_area()
> 		=> hugetlb_get_unmapped_area()
> 			=> vm_unmapped_area()
> 
> To solve this issue, we only need to make hugetlb_get_unmapped_area() take
> the same way as mm->get_unmapped_area(). Add *bottomup() and *topdown()
> two functions, and check current mm->get_unmapped_area() to decide which
> one to use. If mm->get_unmapped_area is equal to arch_get_unmapped_area(),
> hugetlb_get_unmapped_area() calls bottomup routine, otherwise calls topdown
> routine.
> 
> Signed-off-by: Shijie Hu <hushijie3@...wei.com>
> Reported-by: kbuild test robot <lkp@...el.com>
> ---
>  fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 54 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 991c60c7ffe0..61418380f492 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -38,6 +38,7 @@
>  #include <linux/uio.h>
>  
>  #include <linux/uaccess.h>
> +#include <linux/sched/mm.h>
>  
>  static const struct super_operations hugetlbfs_ops;
>  static const struct address_space_operations hugetlbfs_aops;
> @@ -191,13 +192,60 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  
>  #ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>  static unsigned long
> +hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr,
> +		unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> +	struct hstate *h = hstate_file(file);
> +	struct vm_unmapped_area_info info;
> +
> +	info.flags = 0;
> +	info.length = len;
> +	info.low_limit = current->mm->mmap_base;
> +	info.high_limit = TASK_SIZE;
> +	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
> +	info.align_offset = 0;
> +	return vm_unmapped_area(&info);
> +}
> +
> +static unsigned long
> +hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr,
> +		unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> +	struct hstate *h = hstate_file(file);
> +	struct vm_unmapped_area_info info;
> +
> +	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> +	info.length = len;
> +	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
> +	info.high_limit = current->mm->mmap_base;
> +	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
> +	info.align_offset = 0;
> +	addr = vm_unmapped_area(&info);
> +
> +	/*
> +	 * A failed mmap() very likely causes application failure,
> +	 * so fall back to the bottom-up function here. This scenario
> +	 * can happen with large stack limits and large mmap()
> +	 * allocations.
> +	 */
> +	if (unlikely(offset_in_page(addr))) {
> +		VM_BUG_ON(addr != -ENOMEM);
> +		info.flags = 0;
> +		info.low_limit = current->mm->mmap_base;
> +		info.high_limit = TASK_SIZE;
> +		addr = vm_unmapped_area(&info);
> +	}
> +
> +	return addr;
> +}
> +
> +static unsigned long
>  hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>  		unsigned long len, unsigned long pgoff, unsigned long flags)
>  {
>  	struct mm_struct *mm = current->mm;
>  	struct vm_area_struct *vma;
>  	struct hstate *h = hstate_file(file);
> -	struct vm_unmapped_area_info info;
>  
>  	if (len & ~huge_page_mask(h))
>  		return -EINVAL;
> @@ -218,13 +266,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>  			return addr;
>  	}
>  
> -	info.flags = 0;
> -	info.length = len;
> -	info.low_limit = TASK_UNMAPPED_BASE;
> -	info.high_limit = TASK_SIZE;
> -	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
> -	info.align_offset = 0;
> -	return vm_unmapped_area(&info);
> +	if (mm->get_unmapped_area == arch_get_unmapped_area)
> +		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
> +				pgoff, flags);
> +	return hugetlb_get_unmapped_area_topdown(file, addr, len,
> +			pgoff, flags);

I like this code using the value of mm->get_unmapped_area to determine
which routine to call.  It is used by a few architectures.   However, I
noticed that on at least one architecture (powerpc) mm->get_unmapped_area
may be assigned to routines other than arch_get_unmapped_area or
arch_get_unmapped_area_topdown.  In such a case, we would call the 'new'
topdown routine.  I would prefer that we call the bottomup routine in this
default case.

In reality, this does not impact powerpc as that architecture has it's
own hugetlb_get_unmapped_area routine.

Because of this, I suggest we add a comment above this code and switch
the if/else order.  For example,

+       /*
+        * Use mm->get_unmapped_area value as a hint to use topdown routine.
+        * If architectures have special needs, they should define their own
+        * version of hugetlb_get_unmapped_area.
+        */
+       if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
+               return hugetlb_get_unmapped_area_topdown(file, addr, len,
+                               pgoff, flags);
+       return hugetlb_get_unmapped_area_bottomup(file, addr, len,
+                       pgoff, flags);


Thoughts?
-- 
Mike Kravetz


>  }
>  #endif
>  
>