[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9fccc0e4-8f51-d3e7-21de-f85f8837be7f@linux.alibaba.com>
Date: Tue, 19 Sep 2023 21:47:34 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>,
Matthew Bobrowski <mbobrowski@...browski.org>,
Christoph Hellwig <hch@....de>,
Joseph Qi <joseph.qi@...ux.alibaba.com>,
"Darrick J. Wong" <djwong@...nel.org>
Subject: Re: [bug report] ext4 misses final i_size meta sync under O_DIRECT |
O_SYNC semantics after iomap DIO conversion
(sorry... add Darrick here...)
Hi Jan,
On 2023/9/19 20:05, Jan Kara wrote:
> Hello!
>
> On Tue 19-09-23 14:00:04, Gao Xiang wrote:
>> Our consumer reports a behavior change between pre-iomap and iomap
>> direct io conversion:
>>
>> If the system crashes after an appending write to a file open with
>> O_DIRECT | O_SYNC flag set, file i_size won't be updated even if
>> O_SYNC was marked before.
>>
>> It can be reproduced by a test program in the attachment with
>> gcc -o repro repro.c && ./repro testfile && echo c > /proc/sysrq-trigger
>>
>> After some analysis, we found that before iomap direct I/O conversion,
>> the timing was roughly (taking Linux 3.10 codebase as an example):
>>
>> ..
>> - ext4_file_dio_write
>> - __generic_file_aio_write
>> ..
>> - ext4_direct_IO # generic_file_direct_write
>> - ext4_ext_direct_IO
>> - ext4_ind_direct_IO # final_size > inode->i_size
>> - ..
>> - ret = blockdev_direct_IO()
>> - i_size_write(inode, end) # orphan && ret > 0 &&
>> # end > inode->i_size
>> - ext4_mark_inode_dirty()
>> - ...
>> - generic_write_sync # handling O_SYNC
>>
>> So the dirty inode meta will be committed into journal immediately
>> if O_SYNC is set. However, After commit 569342dc2485 ("ext4: move
>> inode extension/truncate code out from ->iomap_end() callback"),
>> the new behavior seems as below:
>>
>> ..
>> - ext4_dio_write_iter
>> - ext4_dio_write_checks # extend = 1
>> - iomap_dio_rw
>> - __iomap_dio_rw
>> - iomap_dio_complete
>> - generic_write_sync
>> - ext4_handle_inode_extension # extend = 1
>>
>> So that i_size will be recorded only after generic_write_sync() is
>> called. So O_SYNC won't flush the update i_size to the disk.
>
> Indeed, that looks like a bug. Thanks for report!
Thanks for the confirmation!
>
>> On the other side, after a quick look of XFS side, it will record
>> i_size changes in xfs_dio_write_end_io() so it seems that it doesn't
>> have this problem.
>
> Yes, I'm a bit hazy on the details but I think we've decided to call
> ext4_handle_inode_extension() directly from ext4_dio_write_iter() because
> from ext4_dio_write_end_io() it was difficult to test in a race-free way
> whether extending i_size (and i_disksize) is needed or not (we don't
> necessarily hold i_rwsem there). I'll think how we could fix the problem
> you've reported.
Yes, another concern is O_DSYNC, I'm quite not sure if the behavior
is changed too.
I had a rough feeling that currently iomap DIO behaviors on these are
too strict and might not fit in each specific fs detailed
implementation, tho.
Thanks,
Gao Xiang
>
> Honza
Powered by blists - more mailing lists