linux-kernel - Re: [PATCH] f2fs: Remove BUG_ON in dec_valid_node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 15 May 2013 08:11:38 -0500
From:	Chris Fries <ccfries@...il.com>
To:	Russell Knize <rknize2@...orola.com>
Cc:	Haicheng Li <haicheng.li@...ux.intel.com>,
	Chris Fries <c.fries@...orola.com>,
	"jaegeuk.kim" <jaegeuk.kim@...sung.com>,
	linux-f2fs-devel@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org,
	Jason Hrycay <jason.hrycay@...orola.com>
Subject: Re: [PATCH] f2fs: Remove BUG_ON in dec_valid_node_count

Hi,

Abandon my patch in favor of Jaegeuk's recovery accounting fix (mine
has a copy paste error anyway)

[f2fs-dev] [PATCH 1/2] f2fs: fix inconsistency of block count during recovery

This patch solves the issue completely.

-Chris


On Tue, May 14, 2013 at 2:26 PM, Russell Knize <rknize2@...orola.com> wrote:
>
> The node accounting errors seem to be creeping in during recovery.
> Once the error is introduced in the total counts, it gets recorded in
> the next checkpoint and is carried forward.  Often times the error
> crops-up after a failed recovery attempt, such as when recovery
> attempts to recover an inode belonging to a subdirectory that was
> previously removed.
>
> We've also seen accounting errors elsewhere, such as the block bitmap
> management in update_sit_entry().  In one case, we bug-out on some
> random unlink operation because truncate_data_blocks_range() tries to
> invalidate a block whose bit is not set in the bitmap:
>
> kernel BUG at ../../../../../../kernel/fs/f2fs/segment.c:199!
> ...
> do_unlinkat()
>     evict()
>         f2fs_evict_inode()
>             f2fs_truncate()
>                 truncate_data_blocks_range()
>                     invalidate_blocks()
>                         update_sit_entry()
>
> In another case, we bug-out during recovery while trying to allocate
> the dummy page for an fsync.  In this case, the page is already
> allocated:
>
> kernel BUG at ../../../../../../kernel/fs/f2fs/segment.c:211!
> ...
> vfs_kern_mount()
>     mount_fs()
>         f2fs_mount()
>             mount_bdev()
>                 f2fs_fill_super()
>                     recover_fsync_data()
>                         recover_data_page()
>                             update_sit_entry()
>
>
> I've experimented with some changes to deal with the bug-outs in
> update_sit_entry() and regenerate the block counts that are maintained
> by the seg_entry and sec_entry structures, but I am not confident that
> there still won't exist a disparity between block allocations and
> their respective nodes.
>
> Some sort of consistency checker would be handy.  ;)
>
> We're going to run these torture tests with disable_roll_forward to
> try to at least confirm that we should be focused on the recovery
> process.
>
> Russ W. Knize
> Mobile Devices Android Platform
> Desk: 847-523-9381
> Mobile: 224-622-1391
>
>
> On Mon, May 13, 2013 at 10:41 AM, Chris Fries <ccfries@...il.com> wrote:
> > Yes, your observations are correct, total_valid_node_count is 0, and the
> > caller is truncate_node.  It could be a bug in recovery accounting
> > somewhere, but I haven't been able to save a copy of the disk image just
> > before the recovery or rmdir failure, which is the best place to start
> > investigating the bug.  It could also be some bad state of the disk
> > during
> > panic restart rather than a straight recovery bug.
> >
> > Adding some traces or logs into recovery might give clues too.
> >
> > The panics I've seen are both during mount and during rmdir.
> >
> > -- truncate_node
> >   -- remove_inode_page
> >     -- f2fs_evict_inode
> >       -- evict
> >         -- recover_orphan_inodes
> >           -- f2fs_fill_super
> >             -- mount_bdev
> >
> > or
> >
> > -- truncate_node
> >   -- remove_inode_page
> >     -- f2fs_evict_inode
> >       -- evict
> >         -- d_delete
> >           -- vfs_rmdir
> >
> >
> >
> >
> > On Mon, May 13, 2013 at 2:05 AM, Haicheng Li
> > <haicheng.li@...ux.intel.com>
> > wrote:
> >>
> >>
> >> On Sun, May 12, 2013 at 10:46:27PM -0500, Chris Fries wrote:
> >> > From: Chris Fries <C.Fries@...orola.com>
> >> >
> >> > Panic loops while running LTP fsstress has been able to get
> >> > a disk into two different panic loops from dec_valid_node_count.
> >> > f2fs.h:714  BUG_ON(sbi->total_valid_node_count < count);
> >>
> >> This is interesting catch.
> >> from the code, dec_valid_node_count() is only called by
> >> truncate_node():
> >>         dec_valid_node_count(sbi, dn->inode, 1);
> >>
> >> So the failure in your test means that sbi->total_valid_node_count < 1,
> >> i.e. equal to 0. This should be an unexpected status.
> >>
> >> I think a better solution should be to avoid such over truncate_node
> >> situation.
> >>
> >> How do you think?
> >>
> >> > Once, it happens during recovery itself, and the disk would cause
> >> > a panic every time it mounted.
> >> >
> >> > Another time, it would happen during garbage collection, so the disk
> >> > would cause a panic within 200 seconds of mounting.
> >> >
> >> > Removing this BUG_ON hasn't shown any side effects, so let's take it
> >> > out and monitor.
> >> >
> >> > Signed-off-by: Chris Fries <C.Fries@...orola.com>
> >> >
> >> > ---
> >> >  fs/f2fs/f2fs.h | 18 +++++++++++++++---
> >> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >> > index e80a87c..b8e9679 100644
> >> > --- a/fs/f2fs/f2fs.h
> >> > +++ b/fs/f2fs/f2fs.h
> >> > @@ -722,9 +722,21 @@ static inline void dec_valid_node_count(struct
> >> > f2fs_sb_info *sbi,
> >> >  {
> >> >       spin_lock(&sbi->stat_lock);
> >> >
> >> > -     BUG_ON(sbi->total_valid_block_count < count);
> >> > -     BUG_ON(sbi->total_valid_node_count < count);
> >> > -     BUG_ON(inode->i_blocks < count);
> >> > +     if (sbi->total_valid_block_count < count) {
> >> > +             WARN(1, "F2FS: total_valid_block_count too small- %d vs
> >> > %d\n",
> >> > +                     (unsigned int)sbi->total_valid_block_count,
> >> > count);
> >> > +             count = sbi->total_valid_block_count;
> >> > +     }
> >> > +     if (sbi->total_valid_node_count < count) {
> >> > +             WARN(1, "F2FS: total_valid_node_count too small- %d vs
> >> > %d\n",
> >> > +                     sbi->total_valid_node_count, count);
> >> > +             count = sbi->total_valid_node_count;
> >> > +     }
> >> > +     if (inode->i_blocks < count) {
> >> > +             WARN(1, "F2FS: inode->i_blocks too small - %d vs %d\n",
> >> > +                     (unsigned int)inode->i_blocks, count);
> >> > +             count = sbi->total_valid_node_count;
> >> > +     }
> >> >
> >> >       inode->i_blocks -= count;
> >> >       sbi->total_valid_node_count -= count;
> >> > --
> >> > 1.8.0
> >> >
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe
> >> > linux-kernel"
> >> > in
> >> > the body of a message to majordomo@...r.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/