lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241129181244.GA11702@mit.edu>
Date: Fri, 29 Nov 2024 08:12:44 -1000
From: "Theodore Ts'o" <tytso@....edu>
To: Niklas Hambüchen <mail@....me>
Cc: Rui Ueyama <rui314@...il.com>, LKML <linux-kernel@...r.kernel.org>,
        Florian Weimer <fw@...eb.enyo.de>
Subject: Re: Wislist for Linux from the mold linker's POV

On Fri, Nov 29, 2024 at 06:38:47AM +0100, Niklas Hambüchen wrote:
> Turns out, `ext4` has built in a feature to work around bad applications forgetting `fsync()`:
> 
> `close()`ing new files is fast.
> But if you `close()` existing files after writing them from scratch, or atomic-rename something replacing them, ext4 will insert an `fsync()`!

It's not actually an fsync() in the close case).  We initiate
writeback, but we don't actually wait for the writes to complete on
the close().  In the case of rename(), we do wait for the writes to
complete before the file system transaction which commits the
rename(2) is allowed to complete.  But in the case where the
application programmer is too lazy to call fsync(2), the delayed
completion of the transaction complete is the implicit commit, and
nothing is bloced behind it.  (See below for more details.)

But yes, the reason behind this is applications such as tuxracer
writing the top-ten score file, and then shutting down OpenGL, and the
out-of-tree nvidia driver would sometimes^H^H^H^H^H^H^H^H^H always
crash leave a corrupted or missing top-ten score file, and this
resulted in a bunch of users whinging.

Also at one poiont, both the KDE and Gnome text editors also did the
open with O_TRUNC and rewrite, because it was the simplest way to
avoid losing the extended attrbutes (otherwise the application
programmers would have to actually copy the extended attriburtes, and
That Was Too Hard).  I don't know why programmers would edit precious
source files using something *other* than emacs, or vi, but....

In essence, file system developers are massively outnumbered by
application programs, and for some reason as a class application
programmers don't seem to be very careful about data corruption
compared to file system developers --- and users *always* blame the
file system developers.

As Niklas points out in his reference, this can be disabled by a mount
option, noauto_da_alloc:

   auto_da_alloc(*), noauto_da_alloc

       Many broken applications don’t use fsync() when replacing
       existing files via patterns such as fd =
       open(“foo.new”)/write(fd,..)/close(fd)/ rename(“foo.new”,
       “foo”), or worse yet, fd = open(“foo”,
       O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled,
       ext4 will detect the replace-via-rename and
       replace-via-truncate patterns and force that any delayed
       allocation blocks are allocated such that at the next journal
       commit, in the default data=ordered mode, the data blocks of
       the new file are forced to disk before the rename() operation
       is committed. This provides roughly the same level of
       guarantees as ext3, and avoids the “zero-length” problem that
       can happen when a system crashes before the delayed allocation
       blocks are forced to disk.

So if you care about performance above all else, and you trust all of
the application programmers responsible for programs on your system
being sufficiently careful, feel free to use the noauto_da_alloc
option.  :-)

					- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ