lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180308005817.GH30836@localhost.localdomain>
Date:   Thu, 8 Mar 2018 08:58:17 +0800
From:   Eryu Guan <guaneryu@...il.com>
To:     Jan Kara <jack@...e.cz>
Cc:     linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>
Subject: Re: [PATCH] ext4: update i_disksize if direct write past ondisk size

On Wed, Mar 07, 2018 at 11:11:10AM +0100, Jan Kara wrote:
> On Tue 23-01-18 16:37:23, Eryu Guan wrote:
> > Currently in ext4 direct write path, we update i_disksize only when
> > new eof is greater than i_size, and don't update it even when new
> > eof is greater than i_disksize but less than i_size. This doesn't
> > work well with delalloc buffer write, which updates i_size and
> > i_disksize only when delalloc blocks are resolved (at writeback
> > time), the i_disksize from direct write can be lost if a previous
> > buffer write succeeded at write time but failed at writeback time,
> > then results in corrupted ondisk inode size.
> > 
> > Consider this case, first buffer write 4k data to a new file at
> > offset 16k with delayed allocation, then direct write 4k data to the
> > same file at offset 4k before delalloc blocks are resolved, which
> > doesn't update i_disksize because it writes within i_size(20k), but
> > the extent tree metadata has been committed in journal. Then
> > writeback of the delalloc blocks fails (due to device error etc.),
> > and i_size/i_disksize from buffer write can't be written to disk
> > (still zero). A subsequent umount/mount cycle recovers journal and
> > writes extent tree metadata from direct write to disk, but with
> > i_disksize being zero.
> > 
> > Fix it by updating i_disksize too in direct write path when new eof
> > is greater than i_disksize but less than i_size, so i_disksize is
> > always consistent with direct write.
> > 
> > This fixes occasional i_size corruption in fstests generic/475.
> > 
> > Signed-off-by: Eryu Guan <eguan@...hat.com>
> > ---
> > I think this matches what XFS does in direct write too.
> > 
> > I've tested it by looping generic/475 200 times without hitting a
> > corruption, usually it fails within 5 iterations for me. Also tested by
> > full fstests runs on ext2_4k, ext3_2k, ext4_1k configurations and all
> > results looked good.
> 
> Thanks for the patch! It looks good to me. Just when looking at these

Thanks for the review!

> i_disksize updates and thinking about mixing them with page writeback there
> seems to be another bug that these i_disksize updates are not protected by
> ei->i_data_sem (which is what protects i_disksize update in the writeback
> path). So probably that should be fixed up as well as otherwise I'm not
> sure we cannot corrupt i_disksize in some funny way when writeback and dio
> write race...

It's been a while and I'll get myself refamiliar with this code path and
look into it. Do you think we can apply this patch as is for now and fix
the race you mentioned in a follow-up patch? Or I should fix both of
them in v2?

Thanks,
Eryu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ