linux-ext4 - Re: Issue with ext4 filesystem corruption when writing to a file after disk exhaustion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJxJ_jh=4q81OnSXk=yAU3u_7CCHZLGhb31eALF0cSyNv34E1g@mail.gmail.com>
Date: Mon, 14 Jul 2025 12:37:21 +0800
From: Jiany Wu <wujianyue000@...il.com>
To: "Theodore Ts'o" <tytso@....edu>
Cc: "Darrick J. Wong" <djwong@...nel.org>, yi.zhang@...wei.com, jack@...e.cz, 
	linux-ext4@...r.kernel.org
Subject: Re: Issue with ext4 filesystem corruption when writing to a file
 after disk exhaustion

Hello, Ted,

Good day, thanks indeed for the clarification~
Yes, previously tried to mount a specific ext4 disk-img to /var/log,
with /dev/loop1 device, and rsyslogd will write to /var/log/syslog.
When /tmp directory exhaust manually via fallocate, / dir will be also
occupied as 100%, and rsyslog write errors in /dev/loop1 happen, later
mount as read-only. Different from the early scenario, but this
scenario is not easy to reproduce.
Tried updating the test case, not fallocate all spaces in disk, now
alloc 95%, everything is normal now, no related error prints anymore.
It is confirmed errors are caused by disk exhaust.
I think the main hesitation part is whether fallocate is allowed to
use the whole disk space.
root@...tbed:~$ df -Th
Filesystem     Type      Size  Used Avail Use% Mounted on
udev           devtmpfs   16G     0   16G   0% /dev
tmpfs          tmpfs     3.2G   53M  3.1G   2% /run
root-overlay   overlay    32G  6.2G   25G  20% /
/dev/nvme0n1p3 ext4       32G  6.2G   25G  20% /host
/dev/loop1     ext4      3.9G  189M  3.5G   6% /var/log
tmpfs          tmpfs      16G  236M   16G   2% /dev/shm
tmpfs          tmpfs     5.0M     0  5.0M   0% /run/lock
tmpfs          tmpfs     4.0M     0  4.0M   0% /sys/fs/cgroup
root@...tbed:~$ mount | grep log
/host/disk-img/var-log.ext4 on /var/log type ext4 (rw,relatime)
root@...tbed:~$ ls -lh /host/disk-img/var-log.ext4
-rw-r--r-- 1 root root 4.0G Jul 14 07:05 /host/disk-img/var-log.ext4
root@...tbed:~$ file /host/disk-img/var-log.ext4
/host/disk-img/var-log.ext4: Linux rev 1.0 ext4 filesystem data,
UUID=49281462-eb22-4f19-8d03-51338eaf278a (needs journal recovery)
(extents) (64bit) (large files) (huge files)

# fallocate to exhaust /tmp directly
root@...tbed:~$ df /tmp
Filesystem     1K-blocks      Used Available Use% Mounted on
root-overlay   229572940 229556556         0 100% /

# loop write error
testbed ERR kernel: [ 1019.470013] I/O error, dev loop1, sector 266248
op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 2
testbed ERR kernel: [ 1019.479242] Buffer I/O error on dev loop1,
logical block 33281, lost async page write
testbed ERR kernel: [ 1009.228833] loop: Write error at byte offset
673349632, length 4096.
testbed CRIT kernel: [ 1019.487101] EXT4-fs error (device loop1):
ext4_check_bdev_write_error:217: comm rs:main Q:Reg: Error while async
write back metadata

# remounting fs as read-only
testbed ERR kernel: [ 1326.758055] Aborting journal on device loop1-8.
testbed CRIT kernel: [ 1326.765336] EXT4-fs error (device loop1):
ext4_journal_check_start:83: comm auditd: Detected aborted journal
testbed CRIT kernel: [ 1326.765960] EXT4-fs error (device loop1):
ext4_journal_check_start:83: comm rs:main Q:Reg: Detected aborted
journal
testbed CRIT kernel: [ 1326.775629] EXT4-fs (loop1): Remounting
filesystem read-only

Best regards,
Jianyue Wu

On Sat, Jul 12, 2025 at 10:34 PM Theodore Ts'o <tytso@....edu> wrote:
>
> On Fri, Jul 11, 2025 at 09:27:14PM -0700, Darrick J. Wong wrote:
> >
> > Honestly it's really too bad that there's no way for an fs to ask the
> > block device how much space it thinks is available, and then teach its
> > own statfs method to return min(fs space available, bdev space
> > availble).
> >
> > Then at least df could report that your 500T ramdisk filesystem on a 4G
> > /tmp really only has 4G of space available.
>
> I think it would be better if there was an extra field in the statfs
> structure that reported bdev space available, and have it show up
> as an extra (optional) column in the df report.
>
> The problem is that bdev space available could be highly variable.
> For example, suppose you had a few thousand users all sharing thinly
> provisioned space.  If a whole bunch of users suddenly all start using
> space, the available space at the storage layer could suddenly
> plummet.  And if the available space starts getting low, this might trigger
> automated, central fstrims on all of the volumes, causing the free
> space to go back up.
>
> Having the free space on a file system as reported by df go up and
> down randomly would very likely cause users to get very confused
> and upset, especially when it wasn't under their control.  Even for a
> single user system the free space in tmpfs could go down suddenly when
> some huge process suddenly started, and then go up suddenly when that
> process gets OOM-killed.  :-)
>
>                                            - Ted