linux-kernel - Re: [syzbot] kernel BUG in remove_inode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ced323a-6d83-3d2e-c974-71ee3b425519@oracle.com>
Date:   Tue, 17 Aug 2021 19:35:52 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     syzbot <syzbot+67654e51e54455f1c585@...kaller.appspotmail.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] kernel BUG in remove_inode_hugepages

On 8/16/21 7:07 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    a2824f19e606 Merge tag 'mtd/fixes-for-5.14-rc7' of git://g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=110d54ee300000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=96f0602203250753
> dashboard link: https://syzkaller.appspot.com/bug?extid=67654e51e54455f1c585
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12eeb6a6300000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13515779300000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+67654e51e54455f1c585@...kaller.appspotmail.com
> 
> Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffedef26928 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 000000000043f589
> RDX: 000000000000effd RSI: 0000000020000680 RDI: 0000000000000003
> RBP: 00007ffedef26930 R08: 0000000000000001 R09: 0000000000400034
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
> R13: 0000000000000000 R14: 00000000004ad018 R15: 0000000000400488
> ------------[ cut here ]------------
> kernel BUG at fs/hugetlbfs/inode.c:532!

Let's me see if I can get syzbot to test it:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git a2824f19e6065a0d3735acd9fe7155b104e7edf5


>From 39f8d6a86215f980ba27a9618b8ec2eed4d9a963 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@...cle.com>
Date: Tue, 17 Aug 2021 18:59:58 -0700
Subject: [PATCH] hugetlb: don't pass page cache pages to
 restore_reserve_on_error

syzbot hit kernel BUG at fs/hugetlbfs/inode.c:532.  This BUG triggers if
the HPageRestoreReserve flag is set on a page in the page cache.  It
should never be set, and the routine huge_add_to_page_cache explicitly
clears the flag after adding to the page cache.

The only code other than huge page allocation which sets the flag is
restore_reserve_on_error.  It will potentially set the flag in rare out
of memory conditions.  syzbot was injecting errors to cause memory
allocation errors in this specific path.

The code in restore_reserve_on_error is doing the right thing.  However,
there are instances where pages in the page cache were being passed to
restore_reserve_on_error.  This is incorrect, as once a page goes into
the cache reservation information is correct.  Even in the case of
error, the page will remain in the cache and no reservation adjustment
is needed.

Modify routines that potentially call restore_reserve_on_error with a
page cache page to no longer do so.

Note on fixes tag:
Prior to commit 846be08578ed ("mm/hugetlb: expand restore_reserve_on_error
functionality") the routine would not process page cache pages because
the HPageRestoreReserve flag is not set on such pages.  Therefore, this
issue could not be trigggered.  The code added by commit 846be08578ed
("mm/hugetlb: expand restore_reserve_on_error functionality") is needed
and correct.  It exposed incorrect calls to restore_reserve_on_error which
is the root cause addressed by this commit.

Fixes: 846be08578ed ("mm/hugetlb: expand restore_reserve_on_error functionality")
Reported-by: syzbot+67654e51e54455f1c585@...kaller.appspotmail.com
Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>
---
 mm/hugetlb.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dfc940d5221d..8ea35ba6699f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2476,7 +2476,7 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
 		if (!rc) {
 			/*
 			 * This indicates there is an entry in the reserve map
-			 * added by alloc_huge_page.  We know it was added
+			 * not added by alloc_huge_page.  We know it was added
 			 * before the alloc_huge_page call, otherwise
 			 * HPageRestoreReserve would be set on the page.
 			 * Remove the entry so that a subsequent allocation
@@ -4660,7 +4660,9 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	spin_unlock(ptl);
 	mmu_notifier_invalidate_range_end(&range);
 out_release_all:
-	restore_reserve_on_error(h, vma, haddr, new_page);
+	/* No restore in case of successful pagetable update (Break COW) */
+	if (new_page != old_page)
+		restore_reserve_on_error(h, vma, haddr, new_page);
 	put_page(new_page);
 out_release_old:
 	put_page(old_page);
@@ -4776,7 +4778,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 	pte_t new_pte;
 	spinlock_t *ptl;
 	unsigned long haddr = address & huge_page_mask(h);
-	bool new_page = false;
+	bool new_page, new_pagecache_page = false;
 
 	/*
 	 * Currently, we are forced to kill the process in the event the
@@ -4799,6 +4801,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 		goto out;
 
 retry:
+	new_page = false;
 	page = find_lock_page(mapping, idx);
 	if (!page) {
 		/* Check for page in userfault range */
@@ -4842,6 +4845,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 					goto retry;
 				goto out;
 			}
+			new_pagecache_page = true;
 		} else {
 			lock_page(page);
 			if (unlikely(anon_vma_prepare(vma))) {
@@ -4926,7 +4930,9 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 	spin_unlock(ptl);
 backout_unlocked:
 	unlock_page(page);
-	restore_reserve_on_error(h, vma, haddr, page);
+	/* restore reserve for newly allocated pages not in page cache */
+	if (new_page && !new_pagecache_page)
+		restore_reserve_on_error(h, vma, haddr, page);
 	put_page(page);
 	goto out;
 }
@@ -5135,6 +5141,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	int ret = -ENOMEM;
 	struct page *page;
 	int writable;
+	bool new_pagecache_page = false;
 
 	if (is_continue) {
 		ret = -EFAULT;
@@ -5228,6 +5235,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 		ret = huge_add_to_page_cache(page, mapping, idx);
 		if (ret)
 			goto out_release_nounlock;
+		new_pagecache_page = true;
 	}
 
 	ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
@@ -5291,7 +5299,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	if (vm_shared || is_continue)
 		unlock_page(page);
 out_release_nounlock:
-	restore_reserve_on_error(h, dst_vma, dst_addr, page);
+	if (!new_pagecache_page)
+		restore_reserve_on_error(h, dst_vma, dst_addr, page);
 	put_page(page);
 	goto out;
 }
-- 
2.31.1