linux-ext4 - Re: [RFC] ext4: don't remove already removed extent

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZSMar4JjHUI+/zt6@debian-BULLSEYE-live-builder-AMD64>
Date:   Sun, 8 Oct 2023 17:10:07 -0400
From:   Eric Whitney <enwlinux@...il.com>
To:     Muhammad Usama Anjum <usama.anjum@...labora.com>
Cc:     Eric Whitney <enwlinux@...il.com>, linux-ext4@...r.kernel.org
Subject: Re: [RFC] ext4: don't remove already removed extent

* Muhammad Usama Anjum <usama.anjum@...labora.com>:
> On 9/20/23 5:41 AM, Eric Whitney wrote:
> > * Muhammad Usama Anjum <usama.anjum@...labora.com>:
> >> Syzbot has hit the following bug on current and all older kernels:
> >> BUG: KASAN: out-of-bounds in ext4_ext_rm_leaf fs/ext4/extents.c:2736 [inline]
> >> BUG: KASAN: out-of-bounds in ext4_ext_remove_space+0x2482/0x4d90 fs/ext4/extents.c:2958
> >> Read of size 18446744073709551508 at addr ffff888073aea078 by task syz-executor420/6443
> >>
> >> On investigation, I've found that eh->eh_entries is zero, ex is
> >> referring to last entry and EXT_LAST_EXTENT(eh) is referring to first.
> >> Hence EXT_LAST_EXTENT(eh) - ex becomes negative and causes the wrong
> >> buffer read.
> >>
> >> element: FFFF8882F8F0D06C       <----- ex
> >> element: FFFF8882F8F0D060
> >> element: FFFF8882F8F0D054
> >> element: FFFF8882F8F0D048
> >> element: FFFF8882F8F0D03C
> >> element: FFFF8882F8F0D030
> >> element: FFFF8882F8F0D024
> >> element: FFFF8882F8F0D018
> >> element: FFFF8882F8F0D00C	<------  EXT_FIRST_EXTENT(eh)
> >> header:  FFFF8882F8F0D000	<------  EXT_LAST_EXTENT(eh) and eh
> >>
> >> Cc: stable@...r.kernel.org
> >> Reported-by: syzbot+6e5f2db05775244c73b7@...kaller.appspotmail.com
> >> Closes: https://groups.google.com/g/syzkaller-bugs/c/G6zS-LKgDW0/m/63MgF6V7BAAJ
> >> Fixes: d583fb87a3ff ("ext4: punch out extents")
> >> Signed-off-by: Muhammad Usama Anjum <usama.anjum@...labora.com>
> >> ---
> >> This patch is only fixing the local issue. There may be bigger bug. Why
> >> is ex set to last entry if the eh->eh_entries is 0. If any ext4
> >> developer want to look at the bug, please don't hesitate.
> >> ---
> >>  fs/ext4/extents.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index e4115d338f101..7b7779b4cb87f 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -2726,7 +2726,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >>  		 * If the extent was completely released,
> >>  		 * we need to remove it from the leaf
> >>  		 */
> >> -		if (num == 0) {
> >> +		if (num == 0 && eh->eh_entries) {
> >>  			if (end != EXT_MAX_BLOCKS - 1) {
> >>  				/*
> >>  				 * For hole punching, we need to scoot all the
> >> -- 
> >> 2.40.1
> >>
> > 
> > Hi:
> > 
> > First, thanks for taking the time to look at this.
> Thank you for replying and giving me pointers that I need to start looking
> at problem from first warning until the bug which can be difficult until I
> debug the problem smartly and learn at least the basics of ext4.
> 
> > 
> > I'm suspicious that syzbot may be fuzzing an extent header or other extent
> > tree components.  As you noticed, eh_entries and ex appear to be inconsistent.
> > Also, note the long series of corrupted file system reports in the console log
> > occurring before the KASAN bug - ext4 had been detecting and rejecting bad
> > data up to that point.  The file system on the disk image provided by sysbot
> > indicates that metadata checksumming was enabled (and it fscks cleanly).
> > That should have caught a corrupted extent header or inode, but perhaps
> > there's a problem.
> > 
> > The console log indicates that the problem occurred on inode #16.  Does the
> > information you've provided above come from testing you did on inode #16
> > (looks like the name was /bin/base64)?
> I couldn't analyze the problem in broad spectrum. There must be some bigger
> thing wrong here.
> 
> > 
> > By any chance, have you found a simpler reproducer than what syzbot provides?
> Not yet, this gets reproduced after a while. I'll try to come up with
> better reproducer if I can.
> 

My suggestion would be to first determine whether syzbot has disabled
metadata checksumming by the point in time when the problem occurs (or
whether temporarily modifying ext4 to make it impossible to disable
metadata checksumming also makes it impossible to reproduce the failure).
It may have done this as part of its test.  If so, this becomes a very low
priority bug for ext4, and you could avoid the effort to find a reproducer.

Eric