linux-ext4 - Re: df returns incorrect size of partition due to huge overhead block count in ext4 partition

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YkMEtJkiR2Qktq9s@mit.edu>
Date:   Tue, 29 Mar 2022 09:08:04 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     Fariya F <fariya.fatima03@...il.com>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: df returns incorrect size of partition due to huge overhead
 block count in ext4 partition

(Removing linux-fsdevel from the cc list since this is an ext4
specific issue.)

On Mon, Mar 28, 2022 at 09:38:18PM +0530, Fariya F wrote:
> Hi Ted,
> 
> Thanks for the response. Really appreciate it. Some questions:
> 
> a) This issue is observed on one of the customer board and hence a fix
> is a must for us or at least I will need to do a work-around so other
> customer boards do not face this issue. As I mentioned my script
> relies on df -h output of used percentage. In the case of the board
> reporting 16Z of used space and size, the available space is somehow
> reported correctly. Should my script rely on available space and not
> on the used space% output of df. Will that be a reliable work-around?
> Do you see any issue in using the partition from then or some where
> down the line the overhead blocks number would create a problem and my
> partition would end up misbehaving or any sort of data loss could
> occur? Data loss would be a concern for us. Please guide.

I'm guessing that the problem was caused by a bit-flip in the
superblock, so it was just a matter of hardware error.  What version
of e2fsprogs are using, and did you have metadata checksum (meta_csum)
feature enabled?  Depending on where the bit-flip happened --- e.g.,
whether it was in memory and then superblock was written out, or on
the eMMC or other storage device --- if the metadata checksum feature
caught the superblock error, it would have detected the issue, and
while it would have required a manual fsck to fix it, at that point it
would have fallen back to use the backup superblock version.

> b) Any other suggestions of a work-around so even if the overhead
> blocks reports more blocks than actual blocks on the partition, i am
> able to use the partition reliably or do you think it would be a
> better suggestion to wait for the fix in e2fsprogs?
> 
> I think apart from the fix in e2fsprogs tool, a kernel fix is also
> required, wherein it performs check that the overhead blocks should
> not be greater than the actual blocks on the partition.

Yes, we can certainly have the kernel check to see if the overhead
value is completely insane, and if so, recalculate it (even though it
would slow down the mount).

Another thing we could do is to always recaluclate the overhead amount
if the file system is smaller than some arbitrary size, on the theory
that (a) for small file systems, the increased time to mount the file
system will not be noticeable, and (b) embedded and mobile devices are
often where "cost optimized" (my polite way of saying crappy quality
to save a pentty or two in Bill of Materials costs) are most likely,
and so those are where bit flips are more likely.

Cheers,

						- Ted