linux-ext4 - fsck infinite loop on corrupt ext4 file system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1250294105.6221.24.camel@bobble.smo.corp.google.com>
Date:	Fri, 14 Aug 2009 16:55:05 -0700
From:	Frank Mayhar <fmayhar@...gle.com>
To:	linux-ext4@...r.kernel.org
Cc:	tytso@....edu
Subject: fsck infinite loop on corrupt ext4 file system

Hello, folks.  We recently ran into a pretty severe ext4 crash (being
worked on by someone else) that caused some seriously corrupted file
systems, one of which in turn exposed an fsck problem.  We noticed this
when fsck started looping endlessly trying to correct that file system.
Basically, the group descriptors were mangled; fsck complains about
invalid checksums, forces a full check and during pass 1 tries to
allocate some inode bitmap blocks (apparently).  That allocation fails,
pass 1 errors out and starts the check over.  Endlessly.

I've attached output from the first few loops; unfortunately the file
system image is far, far too large to transport.  I've done some
analysis and it appears that check_super_block is noticing the problem
and hitting this case:

                if (gd->bg_inode_table == 0) {
                        ctx->invalid_inode_table_flag[i]++;
                        ctx->invalid_bitmaps++;
                }
                free_blocks += gd->bg_free_blocks_count;
                free_inodes += gd->bg_free_inodes_count;

(Around line 623 in super.c in the 1.41.8 source.)

Later, during pass 1, he calls handle_fs_bad_blocks due to
ctx->invalid_bitmaps being set and tries to allocate blocks for the
inode table.  This allocation fails.

I suspect that the inode table blocks in question simply aren't marked
free and certainly fsck isn't so marking them before it does the
allocate.  Should it try to first free the affected blocks?  Isn't the
inode table static?  Why is handle_fs_bad_blocks trying to reallocate it
without at least trying to free it first?
-- 
Frank Mayhar <fmayhar@...gle.com>
Google, Inc.

View attachment "sdi3-fsck-output" of type "text/plain" (21360 bytes)