lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 14 Sep 2014 00:48:34 -0700
From:	Jaegeuk Kim <jaegeuk@...nel.org>
To:	Huang Ying <ying.huang@...el.com>
Cc:	huang ying <huang.ying.caritas@...il.com>,
	Changman Lee <cm224.lee@...sung.com>,
	linux-f2fs-devel@...ts.sourceforge.net,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of
 inode

On Sat, Sep 13, 2014 at 10:23:18PM +0800, Huang Ying wrote:
> On Fri, 2014-09-12 at 15:34 +0800, Huang Ying wrote:
> > On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote:
> > > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote:
> > > > 
> > > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote:
> > > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote:
> > > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@...nel.org> wrote:
> > > > > > 
> > > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote:
> > > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@...nel.org> wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote:
> > > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote:
> > > > > > > > > > > Hi Huang,
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote:
> > > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the
> > > > > > > > > > > > inode is not checkpointed.  The non-inode dnode may be written
> > > > > > > before
> > > > > > > > > > > > inode.  So in find_fsync_dnodes, f2fs_iget will fail, cause the
> > > > > > > > > > > > recovery fail.
> > > > > > > > > > > >
> > > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the
> > > > > > > nid
> > > > > > > > > of
> > > > > > > > > > > > inode < nid of non-inode dnode.  But it is possible for the
> > > > > > > reverse.
> > > > > > > > > > > > For example, because of alloc_nid_failed.
> > > > > > > > > > > >
> > > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in
> > > > > > > > > > > > find_fsync_dnodes.
> > > > > > > > > > > >
> > > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging
> > > > > > > > > > > > patch, that is, from big number to small number.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@...el.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  fs/f2fs/recovery.c |    7 ++++---
> > > > > > > > > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > --- a/fs/f2fs/recovery.c
> > > > > > > > > > > > +++ b/fs/f2fs/recovery.c
> > > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs
> > > > > > > > > > > >                   if (IS_INODE(page) && is_dent_dnode(page))
> > > > > > > > > > > >                           set_inode_flag(F2FS_I(entry->inode),
> > > > > > > > > > > >                                                   FI_INC_LINK);
> > > > > > > > > > > > -         } else {
> > > > > > > > > > > > -                 if (IS_INODE(page) && is_dent_dnode(page)) {
> > > > > > > > > > >
> > > > > > > > > > > If this is not inode block, we should add this inode to recover its
> > > > > > > > > data blocks.
> > > > > > > > > >
> > > > > > > > > > Is it possible that there is only non-inode dnode but no inode when
> > > > > > > > > > find_fsync_dnodes checking dnodes?  Per my understanding, any
> > > > > > > changes to
> > > > > > > > > > file will cause inode page dirty (for example, mtime changed), so
> > > > > > > that
> > > > > > > > > > we will write inode block.  Is it right?  If so, the solution in this
> > > > > > > > > > patch should work too.
> > > > > > > > >
> > > > > > > > > Your description says that f2fs_iget will fail, which causes the
> > > > > > > recovery
> > > > > > > > > fail.
> > > > > > > > > So, I thought it would be better to handle the f2fs_iget failure
> > > > > > > directly.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes.  That is another way to fix the issue.
> > > > > > > >
> > > > > > > >
> > > > > > > > > In addition, we cannot guarantee the write order of dnode and inode.
> > > > > > > > > For exmaple,
> > > > > > > > > 1. the inode is written by flusher or kswapd, then,
> > > > > > > > > 2. f2fs_sync_file writes its dnode.
> > > > > > > > >
> > > > > > > > > In that case, we can get only non-inode dnode in the node chain, since
> > > > > > > the
> > > > > > > > > inode
> > > > > > > > > has not fsync_mark.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I think your solution is better here, but does not fix all scenarios.  If
> > > > > > > > the inode is checkpointed, the file can be recovered, although the inode
> > > > > > > > information may be not up to date.  But if the inode is not checkpointed,
> > > > > > > > f2fs_iget will fail too and recover will fail.
> > > > > > >
> > > > > > > Ok, let me consider your scenarios.
> > > > > > >
> > > > > > > Term: F: fsync_mark, D: dentry_mark
> > > > > > >
> > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > >  -> Lose the latest inode(x). Need to fix.
> > > > > > >
> > > > > > > 2. inode(x) | CP | dnode(F) | inode(x)
> > > > > > >  -> Impossible, but recover latest dnode(F)
> > > > > > >
> > > > > > > 3. CP | inode(x) | dnode(F)
> > > > > > >  -> Need to write inode(DF) in f2fs_sync_file.
> > > > > > >
> > > > > > > 4. CP | dnode(F) | inode(DF)
> > > > > > >  -> If f2fs_iget fails, then goto next.
> > > > > > >
> > > > > > > 5. CP | dnode(F) | inode(x)
> > > > > > >  -> If f2fs_iget fails, then goto next. But, this is an impossible
> > > > > > > scenario.
> > > > > > >     Drop this dnode(F).
> > > > > > >
> > > > > > > Indeed, there were some missing scenarios.
> > > > > > >
> > > > > > > So, how about this patch?
> > > > > > >
> > > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001
> > > > > > > From: Jaegeuk Kim <jaegeuk@...nel.org>
> > > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > > > > >
> > > > > > > We can summarize the roll forward recovery scenarios as follows.
> > > > > > >
> > > > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > > > >
> > > > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > > > -> Update the latest inode(x).
> > > > > > >
> > > > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > > > -> No problem.
> > > > > > >
> > > > > > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > > > > > -> Impossible, but recover latest dnode(F)
> > > > > > >
> > > > > > 
> > > > > > I think this is possible.  If f2fs_sync_file runs concurrently with
> > > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x).
> > > > > 
> > > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with
> > > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown.
> > > > > 
> > > > > In f2fs_sync_file,
> > > > > 	...
> > > > > 	while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > > 		if (fsync_mark_done(sbi, ino))
> > > > > 			goto out;
> > > > > 		mark_inode_dirty_sync(inode);
> > > > > 		ret = f2fs_write_inode(inode, NULL);
> > > > > 		if (ret)
> > > > > 			goto out;
> > > > > 	}
> > > > > 	...
> > > > 
> > > > Is the following situation possible?
> > > > 
> > > > f2fs_sync_file				<writeback>
> > > >   sync_node_pages			  f2fs_write_node_pages
> > > >     write dnode(F)			    sync_node_pages
> > > > 					      write inode(x) /* clear PAGECACHE_TAG_DIRTY */
> > > > 
> > > > 
> > > > That is, f2fs_sync_file run parallel with <writeback>.  The
> > > > sync_node_pages above will return 1, because dnode(F) is written.
> > > > inode(x) is written by <writeback> path.  And because
> > > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages
> > > > called by f2fs_sync_file does not write inode(F).
> > > 
> > > I think Chao's comment would work.
> > > How about this patch?
> > > 
> > > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001
> > > From: Jaegeuk Kim <jaegeuk@...nel.org>
> > > Date: Wed, 10 Sep 2014 00:16:34 -0700
> > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios
> > > 
> > > We can summarize the roll forward recovery scenarios as follows.
> > > 
> > > [Term] F: fsync_mark, D: dentry_mark
> > > 
> > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > -> Update the latest inode(x).
> > > 
> > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > -> No problem.
> > > 
> > > 3. inode(x) | CP | dnode(F) | inode(x)
> > > -> Recover to the latest dnode(F), and drop the last inode(x)
> > > 
> > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > -> No problem.
> > > 
> > > 5. CP | inode(x) | dnode(F)
> > > -> The inode(DF) was missing. Should drop this dnode(F).
> > > 
> > > 6. CP | inode(DF) | dnode(F)
> > > -> No problem.
> > > 
> > > 7. CP | dnode(F) | inode(DF)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > > 
> > > 8. CP | dnode(F) | inode(x)
> > > -> If f2fs_iget fails, then goto next to find inode(DF).
> > >    But it will fail due to no inode(DF).
> > > 
> > > So, this patch adds some missing points such as #1, #5, #7, and #8.
> > > 
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@...nel.org>
> > > ---
> > >  fs/f2fs/file.c     | 20 ++++++++++++----
> > >  fs/f2fs/node.c     | 11 ++++++++-
> > >  fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++---------
> > >  3 files changed, 85 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index e7681c3..70f5d4b 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> > >  			up_write(&fi->i_sem);
> > >  		}
> > >  	} else {
> > > -		/* if there is no written node page, write its inode page */
> > > -		while (!sync_node_pages(sbi, ino, &wbc)) {
> > > -			if (fsync_mark_done(sbi, ino))
> > > -				goto out;
> > > +sync_nodes:
> > > +		sync_node_pages(sbi, ino, &wbc);
> > > +
> > > +		/*
> > > +		 * inode(x) | CP | inode(x) | dnode(F)
> > > +		 *  -> ok
> > 
> > Is it acceptable that we turn this to:
> > 
> > inode(x) | CPU | inode (x) | dnode (F) | inode(F)
> > 
> > > +		 * inode(x) | CP | dnode(F) | inode(x)
> > > +		 *  -> inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > > +		 * CP | inode(x) | dnode(F)
> > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > +		 * CP | dnode(F) | inode(x)
> > > +		 *  -> CP | dnode(F) | inode(x) | inode(DF)
> > > +		 */
> > > +		if (!fsync_mark_done(sbi, ino)) {
> > >  			mark_inode_dirty_sync(inode);
> > >  			ret = f2fs_write_inode(inode, NULL);
> > >  			if (ret)
> > >  				goto out;
> > > +			goto sync_nodes;
> > >  		}
> > > +
> > >  		ret = wait_on_node_pages_writeback(sbi, ino);
> > >  		if (ret)
> > >  			goto out;
> > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > index b32eb56..653aa71 100644
> > > --- a/fs/f2fs/node.c
> > > +++ b/fs/f2fs/node.c
> > > @@ -248,8 +248,17 @@ retry:
> > >  
> > >  	/* update fsync_mark if its inode nat entry is still alive */
> > >  	e = __lookup_nat_cache(nm_i, ni->ino);
> > > -	if (e)
> > > +	if (e) {
> > > +		/*
> > > +		 * CP | inode(x) | dnode(F)
> > > +		 *  -> CP | inode(x) | dnode(F) | inode(DF)
> > > +		 */
> > > +		if (!e->checkpointed && !e->fsync_done &&
> > > +				ni->ino != ni->nid && fsync_done)
> > > +			goto skip;
> > >  		e->fsync_done = fsync_done;
> > > +	}
> > > +skip:
> > >  	write_unlock(&nm_i->nat_tree_lock);
> > >  }
> > 
> > I don't understand why we need so complex logic?  Why not just let
> > e->fsync_done reflect just latest is_fsync_dnode(page)?
> > 
> > It appears that in f2fs_sync_file, what we need is just whether inode
> > page has fsync mark or not.
> 
> Oh, I see.  The e->fsync_done indicates whether latest node of the inode
> has fsync_mark.  That can be used to deal with:
> 
> CP | inode(DF) | dnode(x)
> 
> But I still think it is possible to make e->fsync_done simple via
> introduce another simple e->node_fsync_done to indicate whether the node
> itself has fsync_mark.  What do you think about the solution in the
> below patch?  Just build tested.
> 
> IMHO, it is easier to be understand.  And in effect, it is almost same,
> because it is hard to distinguish between

I think you are showing another complexity with the same functionality.

As I mentioned before, this itself is a little bit corner case, so we need
at least such kind of combined conditions.
So, it doesn't make enough sense to add one more variable per each nat entry.

Thanks,

> 
> CP | inode(x) | dnode(F)
> 
> and
> 
> inode(x) | CP | inode(x) | dnode(F)
> 
> in your solution too.
> 
> Best Regards,
> Huang, Ying
> 
> ----------------------------------------------------->
> ---
>  fs/f2fs/f2fs.h     |    1 +
>  fs/f2fs/file.c     |   26 ++++++++++++++++++++++----
>  fs/f2fs/node.c     |   17 ++++++++++++++++-
>  fs/f2fs/node.h     |    4 +++-
>  fs/f2fs/recovery.c |    2 ++
>  5 files changed, 44 insertions(+), 6 deletions(-)
> 
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -207,10 +207,28 @@ int f2fs_sync_file(struct file *file, lo
>  			up_write(&fi->i_sem);
>  		}
>  	} else {
> -		/* if there is no written node page, write its inode page */
> -		while (!sync_node_pages(sbi, inode->i_ino, &wbc)) {
> -			if (fsync_mark_done(sbi, inode->i_ino))
> -				goto out;
> +		for (;;) {
> +			sync_node_pages(sbi, inode->i_ino, &wbc);
> +			/*
> +			 * inode(x) | CP | dnode(F)
> +			 * -> ok
> +			 * inode(x) | CP | inode(x) | dnode(x)
> +			 * -> inode(x) | CP | inode(x) | dnode(x) | inode(F)
> +			 * inode(x) | CP | inode(x) | dnode(F)
> +			 * -> inode(x) | CP | inode(x) | dnode(F) | inode(F)
> +			 * CP | inode(x) | dnode(x)
> +			 * -> CP | inode(x) | dnode(x) | inode(DF)
> +			 * CP | inode(x) | dnode(F)
> +			 * -> CP | inode(x) | dnode(F) | inode(DF)
> +			 * CP | dnode(F) | inode(x)
> +			 * -> CP | dnode(F) | inode(x) | inode(DF)
> +			 */
> +			if ((is_checkpointed_node(sbi, inode->i_ino) &&
> +			     fsync_mark_done(sbi, inode->i_ino)) ||
> +			    (!is_checkpointed_node(sbi, inode->i_ino) &&
> +			     node_fsync_mark_done(sbi, inode->i_ino) &&
> +			     fsync_mark_done(sbi, inode->i_ino)))
> +				break;
>  			mark_inode_dirty_sync(inode);
>  			ret = f2fs_write_inode(inode, NULL);
>  			if (ret)
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -137,6 +137,20 @@ int is_checkpointed_node(struct f2fs_sb_
>  	return is_cp;
>  }
>  
> +bool node_fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
> +{
> +	struct f2fs_nm_info *nm_i = NM_I(sbi);
> +	struct nat_entry *e;
> +	bool node_fsync_done = false;
> +
> +	read_lock(&nm_i->nat_tree_lock);
> +	e = __lookup_nat_cache(nm_i, nid);
> +	if (e)
> +		node_fsync_done = e->node_fsync_done;
> +	read_unlock(&nm_i->nat_tree_lock);
> +	return node_fsync_done;
> +}
> +
>  bool fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
>  {
>  	struct f2fs_nm_info *nm_i = NM_I(sbi);
> @@ -246,7 +260,8 @@ retry:
>  	nat_set_blkaddr(e, new_blkaddr);
>  	__set_nat_cache_dirty(nm_i, e);
>  
> -	/* update fsync_mark if its inode nat entry is still alive */
> +	e->node_fsync_done = fsync_done;
> +	/* update fsync_done if its inode nat entry is still alive */
>  	e = __lookup_nat_cache(nm_i, ni->ino);
>  	if (e)
>  		e->fsync_done = fsync_done;
> --- a/fs/f2fs/node.h
> +++ b/fs/f2fs/node.h
> @@ -42,7 +42,9 @@ struct node_info {
>  struct nat_entry {
>  	struct list_head list;	/* for clean or dirty nat list */
>  	bool checkpointed;	/* whether it is checkpointed or not */
> -	bool fsync_done;	/* whether the latest node has fsync mark */
> +	bool node_fsync_done;	/* whether the node has fsync mark */
> +	bool fsync_done;	/* whether the latest node of the
> +				 * inode has fsync mark */
>  	struct node_info ni;	/* in-memory node information */
>  };
>  
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -190,6 +190,8 @@ static int find_fsync_dnodes(struct f2fs
>  			if (IS_ERR(entry->inode)) {
>  				err = PTR_ERR(entry->inode);
>  				kmem_cache_free(fsync_entry_slab, entry);
> +				if (err == -ENOENT)
> +					goto next;
>  				break;
>  			}
>  			list_add_tail(&entry->list, head);
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1220,6 +1220,7 @@ struct node_info;
>  
>  bool available_free_memory(struct f2fs_sb_info *, int);
>  int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
> +bool node_fsync_mark_done(struct f2fs_sb_info *, nid_t);
>  bool fsync_mark_done(struct f2fs_sb_info *, nid_t);
>  void fsync_mark_clear(struct f2fs_sb_info *, nid_t);
>  void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ