[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <87y1gy5s9c.fsf@doe.com>
Date: Fri, 22 Sep 2023 17:33:59 +0530
From: Ritesh Harjani (IBM) <ritesh.list@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: Jan Kara <jack@...e.cz>, Gao Xiang <hsiangkao@...ux.alibaba.com>,
linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>,
Matthew Bobrowski <mbobrowski@...browski.org>,
Christoph Hellwig <hch@....de>,
Joseph Qi <joseph.qi@...ux.alibaba.com>
Subject: Re: [bug report] ext4 misses final i_size meta sync under O_DIRECT | O_SYNC semantics after iomap DIO conversion
Jan Kara <jack@...e.cz> writes:
> On Wed 20-09-23 17:08:19, Ritesh Harjani wrote:
>> Jan Kara <jack@...e.cz> writes:
>>
>> > Hello!
>> >
>> > On Tue 19-09-23 14:00:04, Gao Xiang wrote:
>> >> Our consumer reports a behavior change between pre-iomap and iomap
>> >> direct io conversion:
>> >>
>> >> If the system crashes after an appending write to a file open with
>> >> O_DIRECT | O_SYNC flag set, file i_size won't be updated even if
>> >> O_SYNC was marked before.
>> >>
>> >> It can be reproduced by a test program in the attachment with
>> >> gcc -o repro repro.c && ./repro testfile && echo c > /proc/sysrq-trigger
>> >>
>> >> After some analysis, we found that before iomap direct I/O conversion,
>> >> the timing was roughly (taking Linux 3.10 codebase as an example):
>> >>
>> >> ..
>> >> - ext4_file_dio_write
>> >> - __generic_file_aio_write
>> >> ..
>> >> - ext4_direct_IO # generic_file_direct_write
>> >> - ext4_ext_direct_IO
>> >> - ext4_ind_direct_IO # final_size > inode->i_size
>> >> - ..
>> >> - ret = blockdev_direct_IO()
>> >> - i_size_write(inode, end) # orphan && ret > 0 &&
>> >> # end > inode->i_size
>> >> - ext4_mark_inode_dirty()
>> >> - ...
>> >> - generic_write_sync # handling O_SYNC
>> >>
>> >> So the dirty inode meta will be committed into journal immediately
>> >> if O_SYNC is set. However, After commit 569342dc2485 ("ext4: move
>> >> inode extension/truncate code out from ->iomap_end() callback"),
>> >> the new behavior seems as below:
>> >>
>> >> ..
>> >> - ext4_dio_write_iter
>> >> - ext4_dio_write_checks # extend = 1
>> >> - iomap_dio_rw
>> >> - __iomap_dio_rw
>> >> - iomap_dio_complete
>> >> - generic_write_sync
>> >> - ext4_handle_inode_extension # extend = 1
>>
>> Yes, since ext4_handle_inode_extension() will handle inode i_disksize
>> update and mark the inode dirty, generic_write_sync() call should happen
>> after that.
>>
>> That also means then we don't have any generic FS testcase which can
>> validate this behaviour.
>
> Yeah.
>
Ok. Let me then first send a fstest in response to this integrity
problem with directio and o_sync.
-ritesh
Powered by blists - more mailing lists