linux-ext4 - Re: df returns incorrect size of partition due to huge overhead block count in ext4 partition

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACA3K+iNFkLLuXJ7W5N70sVC+RVVszx-xVQojNUE8NqfWFuSVg@mail.gmail.com>
Date:   Tue, 12 Apr 2022 14:26:01 +0530
From:   Fariya F <fariya.fatima03@...il.com>
To:     "Theodore Ts'o" <tytso@....edu>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: df returns incorrect size of partition due to huge overhead block
 count in ext4 partition

The e2fsprogs version is 1.42.99. The exact version of df utility when
I query is 8.25.
The Linux kernel is 4.9.31. Please note the e2fsprogs ipk file was
available as part of Arago distribution for the ARM processor I use.

>From your email I understand that below are the options as of now:
a) Fix in the fsck tool and kernel fix: This is something I am looking
forward to. Could you please help prioritize it?
b) Recalculating overhead at mount time: Is it possible to do it with
some specific options at mount time. I still think option #a is what
works best for us.
c) Enabling metadata checksum: May not be possible for us at the moment.

 Thanks a lot for all your help, Ted. Appreciate if you could
prioritize the fix.

On Tue, Mar 29, 2022 at 6:38 PM Theodore Ts'o <tytso@....edu> wrote:
>
> (Removing linux-fsdevel from the cc list since this is an ext4
> specific issue.)
>
> On Mon, Mar 28, 2022 at 09:38:18PM +0530, Fariya F wrote:
> > Hi Ted,
> >
> > Thanks for the response. Really appreciate it. Some questions:
> >
> > a) This issue is observed on one of the customer board and hence a fix
> > is a must for us or at least I will need to do a work-around so other
> > customer boards do not face this issue. As I mentioned my script
> > relies on df -h output of used percentage. In the case of the board
> > reporting 16Z of used space and size, the available space is somehow
> > reported correctly. Should my script rely on available space and not
> > on the used space% output of df. Will that be a reliable work-around?
> > Do you see any issue in using the partition from then or some where
> > down the line the overhead blocks number would create a problem and my
> > partition would end up misbehaving or any sort of data loss could
> > occur? Data loss would be a concern for us. Please guide.
>
> I'm guessing that the problem was caused by a bit-flip in the
> superblock, so it was just a matter of hardware error.  What version
> of e2fsprogs are using, and did you have metadata checksum (meta_csum)
> feature enabled?  Depending on where the bit-flip happened --- e.g.,
> whether it was in memory and then superblock was written out, or on
> the eMMC or other storage device --- if the metadata checksum feature
> caught the superblock error, it would have detected the issue, and
> while it would have required a manual fsck to fix it, at that point it
> would have fallen back to use the backup superblock version.
>
> > b) Any other suggestions of a work-around so even if the overhead
> > blocks reports more blocks than actual blocks on the partition, i am
> > able to use the partition reliably or do you think it would be a
> > better suggestion to wait for the fix in e2fsprogs?
> >
> > I think apart from the fix in e2fsprogs tool, a kernel fix is also
> > required, wherein it performs check that the overhead blocks should
> > not be greater than the actual blocks on the partition.
>
> Yes, we can certainly have the kernel check to see if the overhead
> value is completely insane, and if so, recalculate it (even though it
> would slow down the mount).
>
> Another thing we could do is to always recaluclate the overhead amount
> if the file system is smaller than some arbitrary size, on the theory
> that (a) for small file systems, the increased time to mount the file
> system will not be noticeable, and (b) embedded and mobile devices are
> often where "cost optimized" (my polite way of saying crappy quality
> to save a pentty or two in Bill of Materials costs) are most likely,
> and so those are where bit flips are more likely.
>
> Cheers,
>
>                                                 - Ted