linux-kernel - Re: [PATCH 2/3] hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <27f8893b-57b3-088d-2d48-9e8acc5987bd@linux.ibm.com>
Date:   Mon, 17 Dec 2018 15:55:28 +0530
From:   "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc:     Michal Hocko <mhocko@...nel.org>, Hugh Dickins <hughd@...gle.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Prakash Sangappa <prakash.sangappa@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        stable@...r.kernel.org
Subject: Re: [PATCH 2/3] hugetlbfs: Use i_mmap_rwsem to fix page
 fault/truncate race

On 12/4/18 1:38 AM, Mike Kravetz wrote:
> hugetlbfs page faults can race with truncate and hole punch operations.
> Current code in the page fault path attempts to handle this by 'backing
> out' operations if we encounter the race.  One obvious omission in the
> current code is removing a page newly added to the page cache.  This is
> pretty straight forward to address, but there is a more subtle and
> difficult issue of backing out hugetlb reservations.  To handle this
> correctly, the 'reservation state' before page allocation needs to be
> noted so that it can be properly backed out.  There are four distinct
> possibilities for reservation state: shared/reserved, shared/no-resv,
> private/reserved and private/no-resv.  Backing out a reservation may
> require memory allocation which could fail so that needs to be taken
> into account as well.
> 
> Instead of writing the required complicated code for this rare
> occurrence, just eliminate the race.  i_mmap_rwsem is now held in read
> mode for the duration of page fault processing.  Hold i_mmap_rwsem
> longer in truncation and hold punch code to cover the call to
> remove_inode_hugepages.
> 
> Cc: <stable@...r.kernel.org>
> Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd")
> Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>
> ---
>   fs/hugetlbfs/inode.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 32920a10100e..3244147fc42b 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -505,8 +505,8 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
>   	i_mmap_lock_write(mapping);
>   	if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
>   		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
> -	i_mmap_unlock_write(mapping);
>   	remove_inode_hugepages(inode, offset, LLONG_MAX);
> +	i_mmap_unlock_write(mapping);
>   	return 0;
>   }


We used to do remove_inode_hugepages()

	mutex_lock(&hugetlb_fault_mutex_table[hash]);
	i_mmap_lock_write(mapping);
	hugetlb_vmdelete_list(&mapping->i_mmap,
	i_mmap_unlock_write(mapping);

did we change the lock ordering with this patch?


> 
> @@ -540,8 +540,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>   			hugetlb_vmdelete_list(&mapping->i_mmap,
>   						hole_start >> PAGE_SHIFT,
>   						hole_end  >> PAGE_SHIFT);
> -		i_mmap_unlock_write(mapping);
>   		remove_inode_hugepages(inode, hole_start, hole_end);
> +		i_mmap_unlock_write(mapping);
>   		inode_unlock(inode);
>   	}
> 

-aneesh