linux-ext4 - Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ipnwnh96.fsf@dmbot.sw.ru>
Date:	Tue, 11 Oct 2011 11:01:57 +0400
From:	Dmitry Monakhov <dmonakhov@...nvz.org>
To:	Curt Wohlgemuth <curtw@...gle.com>,
	Lukas Czerner <lczerner@...hat.com>
Cc:	tytso@....edu, adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block()

On Mon, 10 Oct 2011 08:28:23 -0700, Curt Wohlgemuth <curtw@...gle.com> wrote:
> Hi Lukas:
> 
> On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@...hat.com> wrote:
> > On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
> >
> >> In ext4_ext_next_allocated_block(), the path[depth] might
> >> have a p_ext that is NULL -- see ext4_ext_binsearch().  In
> >> such a case, dereferencing it will crash the machine.
> >>
> >> This patch checks for p_ext == NULL in
> >> ext4_ext_next_allocated_block() before dereferencinging it.
> >>
> >> Tested using a hand-crafted an inode with eh_entries == 0 in
> >> an extent block, verified that running FIEMAP on it crashes
> >> without this patch, works fine with it.
> >
> > Hi Curt,
> >
> > It seems to me that even that the patch fixes the NULL dereference, it
> > is not a complete solution. Is it possible that in "normal" case p_ext
> > would be NULL ? I think that this is only possible in extent split/add
> > case (as noted in ext4_ext_binsearch()) which should be atomic to the
> > other operations (locked i_mutex?).
> 
> Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL.
> 
> We've seen this problem during what appears to be a race between an
> inode growth (or truncate?) and another task doing a FIEMAP ioctl.
> The real problem is that FIEMAP handing in ext4 is just, well, buggy?
Wow, IMHO it not just buggy, it is obviously incorrect, IMHO it is more
fair just return -ENOTSUPP, at least it is much safer.
Yes calling FIEMAP and truncate/write in parallel is stupid, but not
prohibited. 
> 
> ext4_ext_walk_space() will get the i_data_sem, construct the path
> array, then release the semaphore.  But then it does a bazillion
> accesses on the extent/header/index pointers in the path array, with
> no protection against truncate, growth, or any other changes.  As far
> as I can tell, this is the only use of a path array retrieved from
> ext4_ext_find_extent() that isn't completely covered by i_data_sem.
In that case i_data sem protects us from nothing. Path collected can
simply disappear under us. And in fact i don't understand the
reason why we drop i_data_sem too soon. Are any reason to do that?
Seems like following patch should fix the issue.

View attachment "0001-ext4-do-not-drop-i_data_sem-too-soon.patch" of type "text/plain" (2115 bytes)



> 
> > However this seems like an inode corruption so we should probably be
> > more verbose about it and print an appropriate EXT4_ERROR_INODE() or
> > even better check for the corrupted tree in the ext4_ext_find_extent()
> > (instead in ext4_ext_map_blocks()), however this will need to distinguish
> > between the normal and the tree modification case.
> 
> What we've observed many times is a crash during a FIEMAP call to
> ext4_ext_next_allocated_block(), which appears to me to be during a
> race with another thread that's splitting the extent tree.  This
> causes the machine to go down with the inode in a bad state.  But of
> course, fsck won't detect and fix this, so when the machine comes back
> up, and a FIEMAP call is done on this same inode -- without any other
> threads -- it'll crash again.  Hence a nasty crash loop.
> 
> So you're right, in that this isn't the "real solution."  But devising
> a safe, non-racy design for FIEMAP is not so simple, unless of course
> you want to just hold the i_data_sem during the entire loop body of
> ext4_ext_walk_space(), which would be pretty ugly.  Hence the
> "band-aid" approach in my patch, which at least seems correct, if not
> thorough.
> 
> Thanks,
> Curt
> 
> >
> > Thanks!
> > -Lukas
> >
> >>
> >> Signed-off-by: Curt Wohlgemuth <curtw@...gle.com>
> >> ---
> >>  fs/ext4/extents.c |    4 +++-
> >>  1 files changed, 3 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index 57cf568..063a5b8 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
> >>       while (depth >= 0) {
> >>               if (depth == path->p_depth) {
> >>                       /* leaf */
> >> -                     if (path[depth].p_ext !=
> >> +                     /* p_ext can be NULL */
> >> +                     if (path[depth].p_ext &&
> >> +                             path[depth].p_ext !=
> >>                                       EXT_LAST_EXTENT(path[depth].p_hdr))
> >>                         return le32_to_cpu(path[depth].p_ext[1].ee_block);
> >>               } else {
> >>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html