linux-ext4 - Re: [PATCH] ext4: update i_disksize if direct write past ondisk size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180308005817.GH30836@localhost.localdomain>
Date:   Thu, 8 Mar 2018 08:58:17 +0800
From:   Eryu Guan <guaneryu@...il.com>
To:     Jan Kara <jack@...e.cz>
Cc:     linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>
Subject: Re: [PATCH] ext4: update i_disksize if direct write past ondisk size

On Wed, Mar 07, 2018 at 11:11:10AM +0100, Jan Kara wrote:
> On Tue 23-01-18 16:37:23, Eryu Guan wrote:
> > Currently in ext4 direct write path, we update i_disksize only when
> > new eof is greater than i_size, and don't update it even when new
> > eof is greater than i_disksize but less than i_size. This doesn't
> > work well with delalloc buffer write, which updates i_size and
> > i_disksize only when delalloc blocks are resolved (at writeback
> > time), the i_disksize from direct write can be lost if a previous
> > buffer write succeeded at write time but failed at writeback time,
> > then results in corrupted ondisk inode size.
> > 
> > Consider this case, first buffer write 4k data to a new file at
> > offset 16k with delayed allocation, then direct write 4k data to the
> > same file at offset 4k before delalloc blocks are resolved, which
> > doesn't update i_disksize because it writes within i_size(20k), but
> > the extent tree metadata has been committed in journal. Then
> > writeback of the delalloc blocks fails (due to device error etc.),
> > and i_size/i_disksize from buffer write can't be written to disk
> > (still zero). A subsequent umount/mount cycle recovers journal and
> > writes extent tree metadata from direct write to disk, but with
> > i_disksize being zero.
> > 
> > Fix it by updating i_disksize too in direct write path when new eof
> > is greater than i_disksize but less than i_size, so i_disksize is
> > always consistent with direct write.
> > 
> > This fixes occasional i_size corruption in fstests generic/475.
> > 
> > Signed-off-by: Eryu Guan <eguan@...hat.com>
> > ---
> > I think this matches what XFS does in direct write too.
> > 
> > I've tested it by looping generic/475 200 times without hitting a
> > corruption, usually it fails within 5 iterations for me. Also tested by
> > full fstests runs on ext2_4k, ext3_2k, ext4_1k configurations and all
> > results looked good.
> 
> Thanks for the patch! It looks good to me. Just when looking at these

Thanks for the review!

> i_disksize updates and thinking about mixing them with page writeback there
> seems to be another bug that these i_disksize updates are not protected by
> ei->i_data_sem (which is what protects i_disksize update in the writeback
> path). So probably that should be fixed up as well as otherwise I'm not
> sure we cannot corrupt i_disksize in some funny way when writeback and dio
> write race...

It's been a while and I'll get myself refamiliar with this code path and
look into it. Do you think we can apply this patch as is for now and fix
the race you mentioned in a follow-up patch? Or I should fix both of
them in v2?

Thanks,
Eryu