lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1250557822.23227.9.camel@bobble.smo.corp.google.com>
Date:	Mon, 17 Aug 2009 18:10:22 -0700
From:	Frank Mayhar <fmayhar@...gle.com>
To:	linux-ext4@...r.kernel.org
Cc:	tytso@....edu
Subject: Re: fsck infinite loop on corrupt ext4 file system

On Fri, 2009-08-14 at 16:55 -0700, Frank Mayhar wrote:
> Hello, folks.  We recently ran into a pretty severe ext4 crash (being
> worked on by someone else) that caused some seriously corrupted file
> systems, one of which in turn exposed an fsck problem.  We noticed this
> when fsck started looping endlessly trying to correct that file system.
> Basically, the group descriptors were mangled; fsck complains about
> invalid checksums, forces a full check and during pass 1 tries to
> allocate some inode bitmap blocks (apparently).  That allocation fails,
> pass 1 errors out and starts the check over.  Endlessly.

I've made a little more progress since Friday.  I had grabbed a dumpe2fs
dump of the corrupted file system and one of the newly-created file
system on the same device.  Adjusting for normal variation (numbers of
free blocks, flags, etc.), there are no differences _except_ in the very
block groups that fsck complained about having bad checksums.  For those
(and only those), the locations of the block bitmap and inode table
differ.  I've attached the diff output.

In particular, block group 276 claims to have its inode table at blocks
0-204, which is clearly wrong.  This is the block group for which the
allocation failed, causing the original loop.

It's clear that fsck is neither correcting the block groups nor is it
detecting the bad entries properly (a sanity check might be in order
here).  It's not even noticing that it's looping, it just keeps failing
the allocation and retrying.  While it may be that fsck can't recover
the file system in this case, it should at least notice and abort.

My thinking is that the location of the inode tables should be invariant
over the life of the file system.  Certainly there's no place in ext4
itself that changes those fields (that I can see, anyway).  Why couldn't
fsck compute the proper values and compare those against what's there?
-- 
Frank Mayhar <fmayhar@...gle.com>
Google, Inc.

View attachment "dump-diff" of type "text/x-patch" (30384 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ