linux-ext4 - debugfs, e2fsck, dumpe2fs on corrupted ~11 TB partition - all tools filling 16 GB of memory until getting ended

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <6b0f6-68b2e080-9-1e084900@214889527>
Date: Sat, 30 Aug 2025 13:29:42 +0200
From: "Malte Schmidt" <m@...tris.org>
To: linux-ext4@...r.kernel.org
Subject: debugfs, e2fsck, dumpe2fs on corrupted ~11 TB 
 partition - all tools filling 16 GB of memory until getting ended

Hello,

I am currently dealing with a corrupted ext4 filesystem of about 11 TB storage. The grade of the corruption is unclear but I have been able to salvage many files using file carving techniques. However it would be very convenient to get the filesystem in a somewhat working state to extract folder structures and/or filenames. I tried to run a general filesystem check, which finds a lot of overwritten data and plenty of things wrong with inodes. However a few seconds or minutes in all three tools start to fill memory on the machine very good, until it is full and they get ended by the OOM killer.

At the beginning there was about 8 GB memory in the machine, which I later bumped up to 16 GB specifically because I found references such as:

https://serverfault.com/questions/9218/running-out-of-memory-running-fsck-on-large-filesystems
https://unix.stackexchange.com/questions/689714/fsck-ext4-consumes-all-memory-and-gets-killed
https://groups.google.com/g/linux.debian.user/c/tLWRzDDsYY4
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=614082

Which all more or less came down to not having enough memory, so I wanted to try and fix that first.

Settings such as scratch_files was enabled with the location being on a reasonably fast SSD, but that did not help either.

I would like to still try and see what is possible, but I am kind of out of ideas how. What would be the next steps to dive a little deeper as to why the memory is filling up so fast? I suppose that many data on the filesystem, specifically towards the later end of the filesystem, is actually perfectly fine. I am under the assumption that only a brief part of the beginning was overwritten. I was able to verify all superblocks on the block device except the very last one (block 2560000000). Using mke2fs I figured where the superblocks should be located and used a short script to verify the distances between them to make sure I hit the right offset for the filesystem, and do not by accident try to align the filesystem starting on a backup superblock.

I think the offset is right because upon trying to mount, it recognizes the filesystem but tells “the structure needs cleaning”. I am under the assumption that parts were overwritten because my predecessor on the topic tried to recreate a new, clean filesystem or even md raid on these disks, thinking this will not affect the data on them. When I found them the partitions were wiped and some of the data overwritten. All the superblocks however seem to have survived and the actual data also, because I could already verify the results of the filecarving to be actual very good data.

Best regards, looking forward to some interesting insights,

M. S.