linux-ext4 - Re: Recover from a "deleted inode referenced" situation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171015124800.zjsqu623rivp27qv@thunk.org>
Date:   Sun, 15 Oct 2017 08:48:00 -0400
From:   Theodore Ts'o <tytso@....edu>
To:     Kilian Cavalotti <kilian.cavalotti.work@...il.com>
Cc:     Andreas Dilger <adilger@...ger.ca>, linux-ext4@...r.kernel.org
Subject: Re: Recover from a "deleted inode referenced" situation

On Sat, Oct 14, 2017 at 06:16:14PM -0700, Kilian Cavalotti wrote:
> But unfortunately there's another ~17TB of data that fsck didin't
> find. That seems like a lot of data lost from just replaying a
> corrupted journal... :(

It wasn't from replaying a journal, corrupted or not.  Andreas was
mistaken there; remounting the file system read/write would not have
triggered a journal replay; if the journal needed replaying it would
have been replayed on the read-only mount.

There are two possibilities about what could have happened; one is
that the file system was already badly corrupted, but your copy
command hadn't started hitting the corrupted portion of the file
system, and so it was coincidence that the r/w remount happened right
before the errors started getting flagged.

The second possibility is that is that the allocation bitmaps were
corrupted, and shortly after you remounted read/write something stated
to write into your file system, and since the part of the inode table
areas was marked as "available" the write into the file system ended
up smashing the inode table.  (More modern kernels enable the
block_validity option by default, which would have prevented this; but
if you were using an older kernel, it would not have enabled this
feature by default.)

Since the problem started with the resize, I'm actually guessing the
first is more likely.  Especially if you were using an older version
of e2fsprogs/resize2fs, and if you were doing an off-line resize
(i.e., the file system was unmounted at the time).  There were a
number of bugs with older versions of e2fsprogs with file systems
larger than 16TB (hence, the 64-bit file system feature was enabled)
associated with off-line resize, and the manisfestation of these bugs
includes portions of the inode table getting smashed.

Unfortunately, there may not be a lot we can do, if that's the case.  :-(

This is probably not a great time to remind people about the value of
backups, especially off-site backups (even if software was 100%
bug-free, what if there was a fire at your home/work)?

Sorry,

						- Ted