[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241129181244.GA11702@mit.edu>
Date: Fri, 29 Nov 2024 08:12:44 -1000
From: "Theodore Ts'o" <tytso@....edu>
To: Niklas Hambüchen <mail@....me>
Cc: Rui Ueyama <rui314@...il.com>, LKML <linux-kernel@...r.kernel.org>,
Florian Weimer <fw@...eb.enyo.de>
Subject: Re: Wislist for Linux from the mold linker's POV
On Fri, Nov 29, 2024 at 06:38:47AM +0100, Niklas Hambüchen wrote:
> Turns out, `ext4` has built in a feature to work around bad applications forgetting `fsync()`:
>
> `close()`ing new files is fast.
> But if you `close()` existing files after writing them from scratch, or atomic-rename something replacing them, ext4 will insert an `fsync()`!
It's not actually an fsync() in the close case). We initiate
writeback, but we don't actually wait for the writes to complete on
the close(). In the case of rename(), we do wait for the writes to
complete before the file system transaction which commits the
rename(2) is allowed to complete. But in the case where the
application programmer is too lazy to call fsync(2), the delayed
completion of the transaction complete is the implicit commit, and
nothing is bloced behind it. (See below for more details.)
But yes, the reason behind this is applications such as tuxracer
writing the top-ten score file, and then shutting down OpenGL, and the
out-of-tree nvidia driver would sometimes^H^H^H^H^H^H^H^H^H always
crash leave a corrupted or missing top-ten score file, and this
resulted in a bunch of users whinging.
Also at one poiont, both the KDE and Gnome text editors also did the
open with O_TRUNC and rewrite, because it was the simplest way to
avoid losing the extended attrbutes (otherwise the application
programmers would have to actually copy the extended attriburtes, and
That Was Too Hard). I don't know why programmers would edit precious
source files using something *other* than emacs, or vi, but....
In essence, file system developers are massively outnumbered by
application programs, and for some reason as a class application
programmers don't seem to be very careful about data corruption
compared to file system developers --- and users *always* blame the
file system developers.
As Niklas points out in his reference, this can be disabled by a mount
option, noauto_da_alloc:
auto_da_alloc(*), noauto_da_alloc
Many broken applications don’t use fsync() when replacing
existing files via patterns such as fd =
open(“foo.new”)/write(fd,..)/close(fd)/ rename(“foo.new”,
“foo”), or worse yet, fd = open(“foo”,
O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled,
ext4 will detect the replace-via-rename and
replace-via-truncate patterns and force that any delayed
allocation blocks are allocated such that at the next journal
commit, in the default data=ordered mode, the data blocks of
the new file are forced to disk before the rename() operation
is committed. This provides roughly the same level of
guarantees as ext3, and avoids the “zero-length” problem that
can happen when a system crashes before the delayed allocation
blocks are forced to disk.
So if you care about performance above all else, and you trust all of
the application programmers responsible for programs on your system
being sufficiently careful, feel free to use the noauto_da_alloc
option. :-)
- Ted
Powered by blists - more mailing lists