lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <9fccc0e4-8f51-d3e7-21de-f85f8837be7f@linux.alibaba.com> Date: Tue, 19 Sep 2023 21:47:34 +0800 From: Gao Xiang <hsiangkao@...ux.alibaba.com> To: Jan Kara <jack@...e.cz> Cc: linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>, Matthew Bobrowski <mbobrowski@...browski.org>, Christoph Hellwig <hch@....de>, Joseph Qi <joseph.qi@...ux.alibaba.com>, "Darrick J. Wong" <djwong@...nel.org> Subject: Re: [bug report] ext4 misses final i_size meta sync under O_DIRECT | O_SYNC semantics after iomap DIO conversion (sorry... add Darrick here...) Hi Jan, On 2023/9/19 20:05, Jan Kara wrote: > Hello! > > On Tue 19-09-23 14:00:04, Gao Xiang wrote: >> Our consumer reports a behavior change between pre-iomap and iomap >> direct io conversion: >> >> If the system crashes after an appending write to a file open with >> O_DIRECT | O_SYNC flag set, file i_size won't be updated even if >> O_SYNC was marked before. >> >> It can be reproduced by a test program in the attachment with >> gcc -o repro repro.c && ./repro testfile && echo c > /proc/sysrq-trigger >> >> After some analysis, we found that before iomap direct I/O conversion, >> the timing was roughly (taking Linux 3.10 codebase as an example): >> >> .. >> - ext4_file_dio_write >> - __generic_file_aio_write >> .. >> - ext4_direct_IO # generic_file_direct_write >> - ext4_ext_direct_IO >> - ext4_ind_direct_IO # final_size > inode->i_size >> - .. >> - ret = blockdev_direct_IO() >> - i_size_write(inode, end) # orphan && ret > 0 && >> # end > inode->i_size >> - ext4_mark_inode_dirty() >> - ... >> - generic_write_sync # handling O_SYNC >> >> So the dirty inode meta will be committed into journal immediately >> if O_SYNC is set. However, After commit 569342dc2485 ("ext4: move >> inode extension/truncate code out from ->iomap_end() callback"), >> the new behavior seems as below: >> >> .. >> - ext4_dio_write_iter >> - ext4_dio_write_checks # extend = 1 >> - iomap_dio_rw >> - __iomap_dio_rw >> - iomap_dio_complete >> - generic_write_sync >> - ext4_handle_inode_extension # extend = 1 >> >> So that i_size will be recorded only after generic_write_sync() is >> called. So O_SYNC won't flush the update i_size to the disk. > > Indeed, that looks like a bug. Thanks for report! Thanks for the confirmation! > >> On the other side, after a quick look of XFS side, it will record >> i_size changes in xfs_dio_write_end_io() so it seems that it doesn't >> have this problem. > > Yes, I'm a bit hazy on the details but I think we've decided to call > ext4_handle_inode_extension() directly from ext4_dio_write_iter() because > from ext4_dio_write_end_io() it was difficult to test in a race-free way > whether extending i_size (and i_disksize) is needed or not (we don't > necessarily hold i_rwsem there). I'll think how we could fix the problem > you've reported. Yes, another concern is O_DSYNC, I'm quite not sure if the behavior is changed too. I had a rough feeling that currently iomap DIO behaviors on these are too strict and might not fit in each specific fs detailed implementation, tho. Thanks, Gao Xiang > > Honza
Powered by blists - more mailing lists