linux-ext4 - Re: [PATCH 5/5] update i_disksize coherently with block allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8761gqpeay.fsf@openvz.org>
Date:	Sun, 14 Sep 2014 16:38:45 +0400
From:	Dmitry Monakhov <dmonakhov@...nvz.org>
To:	Theodore Ts'o <tytso@....edu>
Cc:	linux-ext4@...r.kernel.org, lczerner@...hat.com
Subject: Re: [PATCH 5/5] update i_disksize coherently with block allocation

On Mon, 25 Aug 2014 11:59:08 +0400, Dmitry Monakhov <dmonakhov@...nvz.org> wrote:
> On Sat, 23 Aug 2014 18:00:29 -0400, "Theodore Ts'o" <tytso@....edu> wrote:
> > On Fri, Aug 22, 2014 at 03:32:27PM +0400, Dmitry Monakhov wrote:
> > > Writeback call trace looks like follows:
> > > ext4_writepages
> > >  while(nr_pages)
> > >  ->journal_start
> > >  ->mpage_map_and_submit_extent -> may alloc some blocks
> > >    ->mpage_map_one_extent
> > >  ->journal_stop
> > > In case of delalloc block i_disksize may be less than i_size. So we have to
> > > update i_disksize each time we allocated and submitted some blocks beyond
> > > i_disksize. And we MUST update it in the same transaction, otherwise this
> > > result in fs-inconsistency in case of upcoming power-failure.
> > > 
> > > Another possible way to fix that issue is to insert inode to orhphan list
> > > on ext4_writepages entrance.
> > > 
> > > testcase: xfstest generic/019
> > > 
> > > Signed-off-by: Dmitry Monakhov <dmonakhov@...nvz.org>
> > 
> > Hi Dmitry, were you seeing generic/019 fail before this patch series?
> > I've been trying to build a kernel with CONFIG_FAIL_MAKE_REQUEST and I
> > haven't been able to get generic/019 to fail on me.  Is there
> > something else we need in order to reliably trigger the test fail?
> As usual this kind of test are not 100% reliable, I've saw failures from
> time to time. But I've assumed that it was side effect of incorrect
> error detection in e2fsck introduced d3f32c2db8f11, But this week i've
> rechecked e2fsck and found that condition was fixed and it is correct.
> In order to speedup testing I use ram dev:
> options brd rd_nr=4 rd_size=10485760 part_show=1
> TEST_DEV=/dev/ram0
> SCRATCH_DEV=/dev/ram1
> And run several rounds for this test:
> for ((i=0;i<20;i++));do ./check generic/019 || break ;done
> 
> You also can increase probability by playing with fsstress options
> --- a/tests/generic/019
> +++ b/tests/generic/019
> @@ -135,7 +135,7 @@ FSSTRESS_AVOID="$FSSTRESS_AVOID -ffsync=0 -fsync=0
> -ffdatasync=0 -f setattr=1"
>  _workout()
>  {
>         out=$SCRATCH_MNT/fsstress.$$
> -       args=`_scale_fsstress_args -p 1 -n999999999 -f setattr=0
> $FSSTRESS_AVOID -d $out`
> +       args=`_scale_fsstress_args -p 8 -n999999999 -f setattr=0
> $FSSTRESS_AVOID -d $out`
>         echo ""
>         echo "Start fsstress.."
>         echo ""
> 
> And finally the cherry on top of this cake I've found that this test
> provoke orphan list corruption or dangling inodes after failure.
> fsck 1.43-WIP (09-Jul-2014)
> e2fsck 1.43-WIP (09-Jul-2014)
> Pass 1: Checking inodes, blocks, and sizes
> Deleted inode 43792 has zero dtime.  Fix<y>? no
> Inodes that were part of a corrupted orphan linked list found.  Fix<y>?
> no
> Inode 493817 was part of the orphaned inode list.  IGNORED.
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences:  -148712 -148714
> Fix<y>? no
> Inode bitmap differences:  -43792 -493817
> Fix<y>? no
> 
> /dev/ram1: ********** WARNING: Filesystem still has errors **********
> 
> /dev/ram1: 201244/655360 files (0.0% non-contiguous), 409632/10485760
> blocks
> [root@...05 xfstests-dev.git2]# INO=493817
> [root@...05 xfstests-dev.git2]# debugfs /dev/ram1 -R "ex <$INO>" ; \
>             debugfs /dev/ram1 -R "stat <$INO>" ; debugfs /dev/ram1 -R "ncheck $INO"
> debugfs 1.43-WIP (09-Jul-2014)
> Level Entries       Logical            Physical Length Flags
>  0/ 0   1/  1     0 -     0   148712 -   148712      1 
> debugfs 1.43-WIP (09-Jul-2014)
> Inode: 493817   Type: symlink    Mode:  0777   Flags: 0x80000
> Generation: 4038911591    Version: 0x00000000:00000001
> User:     0   Group:     0   Size: 638
> File ACL: 0    Directory ACL: 0
> Links: 0   Blockcount: 2
> Fragment:  Address: 0    Number: 0    Size: 0
>  ctime: 0x53fae8e8:ea861dc0 -- Mon Aug 25 11:42:32 2014
>  atime: 0x53fae8e8:ea861dc0 -- Mon Aug 25 11:42:32 2014
>  mtime: 0x53fae8e8:ea861dc0 -- Mon Aug 25 11:42:32 2014
> crtime: 0x53fae8e8:ea861dc0 -- Mon Aug 25 11:42:32 2014
> dtime: 0x0000ab10 -- Thu Jan  1 15:09:52 1970
> Size of extra inode fields: 28
> EXTENTS:
> (0):148712
> debugfs 1.43-WIP (09-Jul-2014)
> Inode   Pathname
> 
> I saw this effect with different file types (synmlink,chdev,regfile)
> From my findings we lost newly created inode during creation.
> Actually code is very simple, but at this moment I can not find why and
> where this happen.
I've had plenty of time to brain storm this issue :).
In fact it is very simple test-environment related issue.
Once we force make_request failure for all new IO requests
ext4_error will tag on-disk SB state with EXT4_ERROR_FS. In normal
situation this update should not reach permanent-storage, but in our
case updated EXT4_SB(sb)->s_sbh may be under writeback so ERROR_FS flag
will be visible on next mount and orphan_list cleanup will be skipped
due to ERROR_FS. Latest action is 100% correct.
It looks we have to fix the test by using another failure technique.
At this moment I think that faulty bcache may works for us. 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html