[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 10 Oct 2011 08:28:23 -0700
From: Curt Wohlgemuth <curtw@...gle.com>
To: Lukas Czerner <lczerner@...hat.com>
Cc: tytso@....edu, adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
Hi Lukas:
On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@...hat.com> wrote:
> On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
>
>> In ext4_ext_next_allocated_block(), the path[depth] might
>> have a p_ext that is NULL -- see ext4_ext_binsearch(). In
>> such a case, dereferencing it will crash the machine.
>>
>> This patch checks for p_ext == NULL in
>> ext4_ext_next_allocated_block() before dereferencinging it.
>>
>> Tested using a hand-crafted an inode with eh_entries == 0 in
>> an extent block, verified that running FIEMAP on it crashes
>> without this patch, works fine with it.
>
> Hi Curt,
>
> It seems to me that even that the patch fixes the NULL dereference, it
> is not a complete solution. Is it possible that in "normal" case p_ext
> would be NULL ? I think that this is only possible in extent split/add
> case (as noted in ext4_ext_binsearch()) which should be atomic to the
> other operations (locked i_mutex?).
Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL.
We've seen this problem during what appears to be a race between an
inode growth (or truncate?) and another task doing a FIEMAP ioctl.
The real problem is that FIEMAP handing in ext4 is just, well, buggy?
ext4_ext_walk_space() will get the i_data_sem, construct the path
array, then release the semaphore. But then it does a bazillion
accesses on the extent/header/index pointers in the path array, with
no protection against truncate, growth, or any other changes. As far
as I can tell, this is the only use of a path array retrieved from
ext4_ext_find_extent() that isn't completely covered by i_data_sem.
> However this seems like an inode corruption so we should probably be
> more verbose about it and print an appropriate EXT4_ERROR_INODE() or
> even better check for the corrupted tree in the ext4_ext_find_extent()
> (instead in ext4_ext_map_blocks()), however this will need to distinguish
> between the normal and the tree modification case.
What we've observed many times is a crash during a FIEMAP call to
ext4_ext_next_allocated_block(), which appears to me to be during a
race with another thread that's splitting the extent tree. This
causes the machine to go down with the inode in a bad state. But of
course, fsck won't detect and fix this, so when the machine comes back
up, and a FIEMAP call is done on this same inode -- without any other
threads -- it'll crash again. Hence a nasty crash loop.
So you're right, in that this isn't the "real solution." But devising
a safe, non-racy design for FIEMAP is not so simple, unless of course
you want to just hold the i_data_sem during the entire loop body of
ext4_ext_walk_space(), which would be pretty ugly. Hence the
"band-aid" approach in my patch, which at least seems correct, if not
thorough.
Thanks,
Curt
>
> Thanks!
> -Lukas
>
>>
>> Signed-off-by: Curt Wohlgemuth <curtw@...gle.com>
>> ---
>> fs/ext4/extents.c | 4 +++-
>> 1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 57cf568..063a5b8 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
>> while (depth >= 0) {
>> if (depth == path->p_depth) {
>> /* leaf */
>> - if (path[depth].p_ext !=
>> + /* p_ext can be NULL */
>> + if (path[depth].p_ext &&
>> + path[depth].p_ext !=
>> EXT_LAST_EXTENT(path[depth].p_hdr))
>> return le32_to_cpu(path[depth].p_ext[1].ee_block);
>> } else {
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists