[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180307101110.as6mduokhzq763zv@quack2.suse.cz>
Date: Wed, 7 Mar 2018 11:11:10 +0100
From: Jan Kara <jack@...e.cz>
To: Eryu Guan <eguan@...hat.com>
Cc: linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>
Subject: Re: [PATCH] ext4: update i_disksize if direct write past ondisk size
On Tue 23-01-18 16:37:23, Eryu Guan wrote:
> Currently in ext4 direct write path, we update i_disksize only when
> new eof is greater than i_size, and don't update it even when new
> eof is greater than i_disksize but less than i_size. This doesn't
> work well with delalloc buffer write, which updates i_size and
> i_disksize only when delalloc blocks are resolved (at writeback
> time), the i_disksize from direct write can be lost if a previous
> buffer write succeeded at write time but failed at writeback time,
> then results in corrupted ondisk inode size.
>
> Consider this case, first buffer write 4k data to a new file at
> offset 16k with delayed allocation, then direct write 4k data to the
> same file at offset 4k before delalloc blocks are resolved, which
> doesn't update i_disksize because it writes within i_size(20k), but
> the extent tree metadata has been committed in journal. Then
> writeback of the delalloc blocks fails (due to device error etc.),
> and i_size/i_disksize from buffer write can't be written to disk
> (still zero). A subsequent umount/mount cycle recovers journal and
> writes extent tree metadata from direct write to disk, but with
> i_disksize being zero.
>
> Fix it by updating i_disksize too in direct write path when new eof
> is greater than i_disksize but less than i_size, so i_disksize is
> always consistent with direct write.
>
> This fixes occasional i_size corruption in fstests generic/475.
>
> Signed-off-by: Eryu Guan <eguan@...hat.com>
> ---
> I think this matches what XFS does in direct write too.
>
> I've tested it by looping generic/475 200 times without hitting a
> corruption, usually it fails within 5 iterations for me. Also tested by
> full fstests runs on ext2_4k, ext3_2k, ext4_1k configurations and all
> results looked good.
Thanks for the patch! It looks good to me. Just when looking at these
i_disksize updates and thinking about mixing them with page writeback there
seems to be another bug that these i_disksize updates are not protected by
ei->i_data_sem (which is what protects i_disksize update in the writeback
path). So probably that should be fixed up as well as otherwise I'm not
sure we cannot corrupt i_disksize in some funny way when writeback and dio
write race...
Honza
>
> fs/ext4/inode.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 534a9130f625..2a75b0aafd31 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3668,7 +3668,7 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter)
> int orphan = 0;
> handle_t *handle;
>
> - if (final_size > inode->i_size) {
> + if (final_size > inode->i_size || final_size > ei->i_disksize) {
> /* Credits for sb + inode write */
> handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
> if (IS_ERR(handle)) {
> @@ -3780,9 +3780,10 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter)
> ext4_orphan_del(handle, inode);
> if (ret > 0) {
> loff_t end = offset + ret;
> - if (end > inode->i_size) {
> + if (end > inode->i_size || end > ei->i_disksize) {
> ei->i_disksize = end;
> - i_size_write(inode, end);
> + if (end > inode->i_size)
> + i_size_write(inode, end);
> /*
> * We're going to return a positive `ret'
> * here due to non-zero-length I/O, so there's
> --
> 2.14.3
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists