[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250711052905.GC2026761@mit.edu>
Date: Fri, 11 Jul 2025 01:29:05 -0400
From: "Theodore Ts'o" <tytso@....edu>
To: Jiany Wu <wujianyue000@...il.com>
Cc: yi.zhang@...wei.com, jack@...e.cz, linux-ext4@...r.kernel.org
Subject: Re: Issue with ext4 filesystem corruption when writing to a file
after disk exhaustion
On Fri, Jul 11, 2025 at 11:20:32AM +0800, Jiany Wu wrote:
> Hello,
>
> Recently I encountered an issue in kernel 6.1.123, when writing to a
> file after disk exhaustion, it will report EFSCORRUPTED. I think it is
> un-expected behavior.
What you did was created a file system in /tmp/mydisk by creating a
sparse image file:
> root@...tbed:/tmp# touch mydisk
> root@...tbed:/tmp# ls -l mydisk
> -rw-r--r-- 1 root root 0 Jul 8 05:36 mydisk
> root@...tbed:/tmp# truncate -s 128M mydisk
> root@...tbed:/tmp# mkfs.ext4 mydisk
The potential problem is this assumes that /tmp had enough space to
write 128M of space. But it's clear that it didn't have enough space.
Do not only did you exhaust the space in the file system, you *also*
exhausted space in /tmp. You can see this because of the I/O errors
when writing to /dev/loop2:
> root@...tbed:/tmp# mount mydisk /mnt/test_fs/
> root@...tbed:/tmp# findmnt /mnt/test_fs
> TARGET SOURCE FSTYPE OPTIONS
> /mnt/test_fs /dev/loop2 ext4 rw,relatime
> ...
> root@...tbed:/mnt/test_fs# fallocate -l 32716560K /mnt/test_fs/test_file
> fallocate: fallocate failed: No space left on device
> root@...tbed:/mnt/test_fs# journalctl -f
> Jul 08 05:43:07 testbed kernel: loop: Write error at byte offset
> 9178112, length 1024.
> Jul 08 05:43:07 testbed kernel: loop: Write error at byte offset
> 274432, length 1024.
These error messages are write errors in /dev/loop2, which were almost
certainly caused by ENOSPC errors when trying to write to /tmp/mydisk.
This is the moral equivalent of buying a fradulent USB thumb drive
from the back alleys of Shenzhen, where the USB thumb drive was
*labelled* as having 128MB of storage, but which only had 16MB of
flash, such that writes after the first 16MB would fail (or overwrite
other disk blocks).
If /tmp had enough space, then you wouldn't have see these errors.
One alternative way you could create the image would have been to replace
> root@...tbed:/tmp# touch mydisk
> root@...tbed:/tmp# ls -l mydisk
> -rw-r--r-- 1 root root 0 Jul 8 05:36 mydisk
> root@...tbed:/tmp# truncate -s 128M mydisk
with:
root@...tbed:/tmp# dd if=/dev/zero of=mydisk bs=1M count=128
This allocates 128MB to /tmp/mydisk, and if there isn't enough space
in /tmp, the dd will fail with an error. If it succeeds, then when
you create the file system and mount it, you won't see the error
messages writing to /dev/loopN.
The bottom line is that the bug is a PEBCAK ("probem exists between
chair and keyboard") which is another way of saying, it's a failure in
the system admisitrator not understanding that they had done something
bad. It is not a kernel bug, but rather a bug in your procedure /
system setup.
Cheers,
- Ted
Powered by blists - more mailing lists