[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <02d18236-26ef-09b0-90ad-030c4fe3ee20@linux.alibaba.com>
Date: Tue, 19 Sep 2023 14:00:04 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: linux-ext4@...r.kernel.org
Cc: Theodore Ts'o <tytso@....edu>, Jan Kara <jack@...e.cz>,
Matthew Bobrowski <mbobrowski@...browski.org>,
Christoph Hellwig <hch@....de>,
Joseph Qi <joseph.qi@...ux.alibaba.com>
Subject: [bug report] ext4 misses final i_size meta sync under O_DIRECT |
O_SYNC semantics after iomap DIO conversion
Hi folks,
Our consumer reports a behavior change between pre-iomap and iomap
direct io conversion:
If the system crashes after an appending write to a file open with
O_DIRECT | O_SYNC flag set, file i_size won't be updated even if
O_SYNC was marked before.
It can be reproduced by a test program in the attachment with
gcc -o repro repro.c && ./repro testfile && echo c > /proc/sysrq-trigger
After some analysis, we found that before iomap direct I/O conversion,
the timing was roughly (taking Linux 3.10 codebase as an example):
..
- ext4_file_dio_write
- __generic_file_aio_write
..
- ext4_direct_IO # generic_file_direct_write
- ext4_ext_direct_IO
- ext4_ind_direct_IO # final_size > inode->i_size
- ..
- ret = blockdev_direct_IO()
- i_size_write(inode, end) # orphan && ret > 0 &&
# end > inode->i_size
- ext4_mark_inode_dirty()
- ...
- generic_write_sync # handling O_SYNC
So the dirty inode meta will be committed into journal immediately
if O_SYNC is set. However, After commit 569342dc2485 ("ext4: move
inode extension/truncate code out from ->iomap_end() callback"),
the new behavior seems as below:
..
- ext4_dio_write_iter
- ext4_dio_write_checks # extend = 1
- iomap_dio_rw
- __iomap_dio_rw
- iomap_dio_complete
- generic_write_sync
- ext4_handle_inode_extension # extend = 1
So that i_size will be recorded only after generic_write_sync() is
called. So O_SYNC won't flush the update i_size to the disk.
On the other side, after a quick look of XFS side, it will record
i_size changes in xfs_dio_write_end_io() so it seems that it doesn't
have this problem.
Thanks,
Gao Xiang
View attachment "repro.c" of type "text/plain" (1282 bytes)
Powered by blists - more mailing lists