lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <174A8D06-B9B6-4546-A528-7A814D538208@dilger.ca>
Date: Wed, 18 Feb 2026 14:55:13 -0700
From: Andreas Dilger <adilger@...ger.ca>
To: Vyacheslav Kovalevsky <slava.kovalevskiy.2014@...il.com>
Cc: tytso@....edu,
 linux-ext4@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: Writing more than 4096 bytes with O_SYNC flag does not persist
 all previously written data if system crashes

On Feb 18, 2026, at 06:29, Vyacheslav Kovalevsky <slava.kovalevskiy.2014@...il.com> wrote:
> 
> Detailed description
> ====================
> 
> Hello, there seems to be an issue with ext4 crash behavior:
> 
> 1. Create and sync a new file.
> 2. Open the file and write some data (must be more than 4096 bytes).
> 3. Close the file.
> 4. Open the file with O_SYNC flag and write some data.
> 
> After system crash the file will have the wrong size and some previously written data will be lost.
> 
> According to Linux manual <https://man7.org/linux/man-pages/man2/open.2.html> O_SYNC can replaced with fsync() call after each write operation:
> 
> ```
> By the time write(2) (or similar) returns, the output data
> and associated file metadata have been transferred to the
> underlying hardware (i.e., as though each write(2) was
> followed by a call to fsync(2)).
> ```
> 
> In this case it is not true, using O_SYNC does not persist the data like fsync() does (see test below).
> 
> Notes:
> - This also seems to affect XFS in the same way.

Well, the O_SYNC flag has to be on the file descriptor where writes are done.
In your case, the "write some data" at the start is done on a file descriptor
that does *not* have O_SYNC, so the semantics of that flag do not apply to
those initial writes.  It is the same as O_TRUNC or O_DIRECT or other flags
only affecting the file descriptor where it is used, not some earlier or later
file descriptor.

Either the "write some data" phase must also use O_SYNC, or call fsync() on
that file descriptor before closing it, or call fsync() on the later file
descriptor (assuming persistence of the initial writes do not matter until
the later writes are done).

If anything, the man page should be updated to be more concise, like:

    "the *just written* output data *on that file descriptor* and associated
     file metadata have been transferred to the underlying hardware (i.e.
     as though each write(2) was followed by a call to sync_file_range(2)
     for the corresponding file offset(s))"


Cheers, Andreas

> 
> System info
> ===========
> 
> Linux version 6.19.2
> 
> 
> How to reproduce
> ================
> 
> ```
> #include <errno.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <unistd.h>
> 
> #define BUFFER_LEN 5000 // should be at least ~ 4096+1
> 
> int main() {
>   int status;
>   int file_fd0;
>   int file_fd1;
>   int file_fd2;
> 
>   char buffer[BUFFER_LEN + 1] = {};
>   for (int i = 0; i <= BUFFER_LEN; ++i) {
>     buffer[i] = (char)i;
>   }
> 
>   status = creat("file", S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
>   printf("CREAT: %d\n", status);
>   file_fd0 = status;
> 
>   status = close(file_fd0);
>   printf("CLOSE: %d\n", status);
> 
>   sync();
> 
>   status = open("file", O_WRONLY);
>   printf("OPEN: %d\n", status);
>   file_fd1 = status;
> 
>   status = write(file_fd1, buffer, BUFFER_LEN);
>   printf("WRITE: %d\n", status);
> 
>   status = close(file_fd1);
>   printf("CLOSE: %d\n", status);
> 
>   status = open("file", O_WRONLY | O_SYNC);
>   printf("OPEN: %d\n", status);
>   file_fd2 = status;
> 
>   status = write(file_fd2, "Test data!", 10);
>   printf("WRITE: %d\n", status);
> 
>   status = close(file_fd2);
>   printf("CLOSE: %d\n", status);
> }
> // after crash file size is 4096 instead of 5000
> ```
> 
> Output:
> 
> ```
> CREAT: 3
> CLOSE: 0
> OPEN: 3
> WRITE: 5000
> CLOSE: 0
> OPEN: 3
> WRITE: 10
> CLOSE: 0
> ```
> 
> File content after crash:
> 
> ```
> $ xxd file
> 00000000: 5465 7374 2064 6174 6121 0a0b 0c0d 0e0f  Test data!......
> 00000010: 1011 1213 1415 1617 1819 1a1b 1c1d 1e1f ................
> 00000020: 2021 2223 2425 2627 2829 2a2b 2c2d 2e2f  !"#$%&'()*+,-./
> 
> .........
> 
> 00000ff0: f0f1 f2f3 f4f5 f6f7 f8f9 fafb fcfd feff ................
> ```
> 
> Steps:
> 
> 1. Create and mount new ext4 file system in default configuration.
> 2. Change directory to root of the file system and run the compiled test.
> 3. Cause hard system crash (e.g. QEMU `system_reset` command).
> 4. Remount file system after crash.
> 5. Observe that file size is 4096 instead of 5000.
> 
> Notes:
> 
> - This also seems to affect XFS in the same way.
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ