lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 26 Nov 2019 04:57:26 +0100 From: Javier González <javier@...igon.com> To: Damien Le Moal <Damien.LeMoal@....com> Cc: "jaegeuk@...nel.org" <jaegeuk@...nel.org>, "yuchao0@...wei.com" <yuchao0@...wei.com>, "linux-f2fs-devel@...ts.sourceforge.net" <linux-f2fs-devel@...ts.sourceforge.net>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Javier González <javier.gonz@...sung.com> Subject: Re: [PATCH] f2fs: disble physical prealloc in LSF mount On 26.11.2019 02:06, Damien Le Moal wrote: >On 2019/11/26 4:03, Javier González wrote: >> On 25.11.2019 00:48, Damien Le Moal wrote: >>> On 2019/11/22 18:00, Javier González wrote: >>>> From: Javier González <javier.gonz@...sung.com> >>>> >>>> Fix file system corruption when using LFS mount (e.g., in zoned >>>> devices). Seems like the fallback into buffered I/O creates an >>>> inconsistency if the application is assuming both read and write DIO. I >>>> can easily reproduce a corruption with a simple RocksDB test. >>>> >>>> Might be that the f2fs_forced_buffered_io path brings some problems too, >>>> but I have not seen other failures besides this one. >>>> >>>> Problem reproducible without a zoned block device, simply by forcing >>>> LFS mount: >>>> >>>> $ sudo mkfs.f2fs -f -m /dev/nvme0n1 >>>> $ sudo mount /dev/nvme0n1 /mnt/f2fs >>>> $ sudo /opt/rocksdb/db_bench --benchmarks=fillseq --use_existing_db=0 >>>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>>> --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1 >>>> --block_size=65536 >>>> >>>> Note that the options that cause the problem are: >>>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>>> >>>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write") >>>> >>>> Signed-off-by: Javier González <javier.gonz@...sung.com> >>>> --- >>>> fs/f2fs/data.c | 3 --- >>>> 1 file changed, 3 deletions(-) >>>> >>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>>> index 5755e897a5f0..b045dd6ab632 100644 >>>> --- a/fs/f2fs/data.c >>>> +++ b/fs/f2fs/data.c >>>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >>>> return err; >>>> } >>>> >>>> - if (direct_io && allow_outplace_dio(inode, iocb, from)) >>>> - return 0; >>> >>> Since for LFS mode, all DIOs can end up out of place, I think that it >>> may be better to change allow_outplace_dio() to always return true in >>> the case of LFS mode. So may be something like: >>> >>> static inline int allow_outplace_dio(struct inode *inode, >>> struct kiocb *iocb, struct iov_iter *iter) >>> { >>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>> int rw = iov_iter_rw(iter); >>> >>> return test_opt(sbi, LFS) || >>> (rw == WRITE && !block_unaligned_IO(inode, iocb, iter)); >>> } >>> >>> instead of the original: >>> >>> static inline int allow_outplace_dio(struct inode *inode, >>> struct kiocb *iocb, struct iov_iter *iter) >>> { >>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>> int rw = iov_iter_rw(iter); >>> >>> return (test_opt(sbi, LFS) && (rw == WRITE) && >>> !block_unaligned_IO(inode, iocb, iter)); >>> } >>> >>> Thoughts ? >>> >> >> I see what you mean and it makes sense. However, the problem I am seeing >> occurs when allow_outplace_dio() returns true, as this is what creates >> the inconsistency between the write being buffered and the read being >> DIO. > >But if the write is switched to buffered, the DIO read should use the >buffered path too, no ? Since this is all happening under VFS, the >generic DIO read path will not ensure that the buffered writes are >flushed to disk before issuing the direct read, I think. So that would >explain your data corruption, i.e. you are reading stale data on the >device before the buffered writes make it to the media. > As far as I can see, the read is always sent DIO, so yes, I also believe that we are reading stale data. This is why the corruption is not seen if preventing allow_outplace_dio() from sending the write to the buffered path. What surprises me is that this is very easy to trigger (see commit), so I assume you must have seen this with SMR in the past. Does it make sense to leave the LFS check out of the allow_outplace_dio()? Or in other words, is there a hard requirement for writes to take this path on a zoned device that I am not seeing? Something like: static inline int allow_outplace_dio(struct inode *inode, struct kiocb *iocb, struct iov_iter *iter) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); int rw = iov_iter_rw(iter); return (rw == WRITE && !block_unaligned_IO(inode, iocb, iter)); } Thanks, Javier
Powered by blists - more mailing lists