lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20140914074834.GB51401@jaegeuk-mac02.hsd1.ca.comcast.net> Date: Sun, 14 Sep 2014 00:48:34 -0700 From: Jaegeuk Kim <jaegeuk@...nel.org> To: Huang Ying <ying.huang@...el.com> Cc: huang ying <huang.ying.caritas@...il.com>, Changman Lee <cm224.lee@...sung.com>, linux-f2fs-devel@...ts.sourceforge.net, LKML <linux-kernel@...r.kernel.org> Subject: Re: [PATCH] f2fs: Fix recover when nid of non-inode dnode < nid of inode On Sat, Sep 13, 2014 at 10:23:18PM +0800, Huang Ying wrote: > On Fri, 2014-09-12 at 15:34 +0800, Huang Ying wrote: > > On Thu, 2014-09-11 at 22:13 -0700, Jaegeuk Kim wrote: > > > On Thu, Sep 11, 2014 at 08:25:17PM +0800, Huang Ying wrote: > > > > > > > > On Wed, 2014-09-10 at 22:37 -0700, Jaegeuk Kim wrote: > > > > > On Wed, Sep 10, 2014 at 07:08:32PM +0800, huang ying wrote: > > > > > > On Wed, Sep 10, 2014 at 3:21 PM, Jaegeuk Kim <jaegeuk@...nel.org> wrote: > > > > > > > > > > > > > On Tue, Sep 09, 2014 at 07:31:49PM +0800, huang ying wrote: > > > > > > > > On Tue, Sep 9, 2014 at 3:09 PM, Jaegeuk Kim <jaegeuk@...nel.org> wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > On Tue, Sep 09, 2014 at 01:39:30PM +0800, Huang Ying wrote: > > > > > > > > > > On Mon, 2014-09-08 at 22:23 -0700, Jaegeuk Kim wrote: > > > > > > > > > > > Hi Huang, > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 08, 2014 at 07:38:26PM +0800, Huang Ying wrote: > > > > > > > > > > > > For fsync, if the nid of a non-inode dnode < nid of inode and the > > > > > > > > > > > > inode is not checkpointed. The non-inode dnode may be written > > > > > > > before > > > > > > > > > > > > inode. So in find_fsync_dnodes, f2fs_iget will fail, cause the > > > > > > > > > > > > recovery fail. > > > > > > > > > > > > > > > > > > > > > > > > Usually, inode will be allocated before non-inode dnode, so the > > > > > > > nid > > > > > > > > > of > > > > > > > > > > > > inode < nid of non-inode dnode. But it is possible for the > > > > > > > reverse. > > > > > > > > > > > > For example, because of alloc_nid_failed. > > > > > > > > > > > > > > > > > > > > > > > > This is fixed via ignoring non-inode dnode before inode dnode in > > > > > > > > > > > > find_fsync_dnodes. > > > > > > > > > > > > > > > > > > > > > > > > The patch was tested via allocating nid reversely via a debugging > > > > > > > > > > > > patch, that is, from big number to small number. > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Huang, Ying <ying.huang@...el.com> > > > > > > > > > > > > --- > > > > > > > > > > > > fs/f2fs/recovery.c | 7 ++++--- > > > > > > > > > > > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > > > > > > > --- a/fs/f2fs/recovery.c > > > > > > > > > > > > +++ b/fs/f2fs/recovery.c > > > > > > > > > > > > @@ -172,8 +172,8 @@ static int find_fsync_dnodes(struct f2fs > > > > > > > > > > > > if (IS_INODE(page) && is_dent_dnode(page)) > > > > > > > > > > > > set_inode_flag(F2FS_I(entry->inode), > > > > > > > > > > > > FI_INC_LINK); > > > > > > > > > > > > - } else { > > > > > > > > > > > > - if (IS_INODE(page) && is_dent_dnode(page)) { > > > > > > > > > > > > > > > > > > > > > > If this is not inode block, we should add this inode to recover its > > > > > > > > > data blocks. > > > > > > > > > > > > > > > > > > > > Is it possible that there is only non-inode dnode but no inode when > > > > > > > > > > find_fsync_dnodes checking dnodes? Per my understanding, any > > > > > > > changes to > > > > > > > > > > file will cause inode page dirty (for example, mtime changed), so > > > > > > > that > > > > > > > > > > we will write inode block. Is it right? If so, the solution in this > > > > > > > > > > patch should work too. > > > > > > > > > > > > > > > > > > Your description says that f2fs_iget will fail, which causes the > > > > > > > recovery > > > > > > > > > fail. > > > > > > > > > So, I thought it would be better to handle the f2fs_iget failure > > > > > > > directly. > > > > > > > > > > > > > > > > > > > > > > > > > Yes. That is another way to fix the issue. > > > > > > > > > > > > > > > > > > > > > > > > > In addition, we cannot guarantee the write order of dnode and inode. > > > > > > > > > For exmaple, > > > > > > > > > 1. the inode is written by flusher or kswapd, then, > > > > > > > > > 2. f2fs_sync_file writes its dnode. > > > > > > > > > > > > > > > > > > In that case, we can get only non-inode dnode in the node chain, since > > > > > > > the > > > > > > > > > inode > > > > > > > > > has not fsync_mark. > > > > > > > > > > > > > > > > > > > > > > > > > I think your solution is better here, but does not fix all scenarios. If > > > > > > > > the inode is checkpointed, the file can be recovered, although the inode > > > > > > > > information may be not up to date. But if the inode is not checkpointed, > > > > > > > > f2fs_iget will fail too and recover will fail. > > > > > > > > > > > > > > Ok, let me consider your scenarios. > > > > > > > > > > > > > > Term: F: fsync_mark, D: dentry_mark > > > > > > > > > > > > > > 1. inode(x) | CP | inode(x) | dnode(F) > > > > > > > -> Lose the latest inode(x). Need to fix. > > > > > > > > > > > > > > 2. inode(x) | CP | dnode(F) | inode(x) > > > > > > > -> Impossible, but recover latest dnode(F) > > > > > > > > > > > > > > 3. CP | inode(x) | dnode(F) > > > > > > > -> Need to write inode(DF) in f2fs_sync_file. > > > > > > > > > > > > > > 4. CP | dnode(F) | inode(DF) > > > > > > > -> If f2fs_iget fails, then goto next. > > > > > > > > > > > > > > 5. CP | dnode(F) | inode(x) > > > > > > > -> If f2fs_iget fails, then goto next. But, this is an impossible > > > > > > > scenario. > > > > > > > Drop this dnode(F). > > > > > > > > > > > > > > Indeed, there were some missing scenarios. > > > > > > > > > > > > > > So, how about this patch? > > > > > > > > > > > > > > From 552dc68c5f07a335d7b55c197bab531efb135521 Mon Sep 17 00:00:00 2001 > > > > > > > From: Jaegeuk Kim <jaegeuk@...nel.org> > > > > > > > Date: Wed, 10 Sep 2014 00:16:34 -0700 > > > > > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios > > > > > > > > > > > > > > We can summarize the roll forward recovery scenarios as follows. > > > > > > > > > > > > > > [Term] F: fsync_mark, D: dentry_mark > > > > > > > > > > > > > > 1. inode(x) | CP | inode(x) | dnode(F) > > > > > > > -> Update the latest inode(x). > > > > > > > > > > > > > > 2. inode(x) | CP | inode(F) | dnode(F) > > > > > > > -> No problem. > > > > > > > > > > > > > > 3. inode(x) | CP | dnode(F) | inode(x) > > > > > > > -> Impossible, but recover latest dnode(F) > > > > > > > > > > > > > > > > > > > I think this is possible. If f2fs_sync_file runs concurrently with > > > > > > writeback. f2fs_sync_file written dnode(F), then writeback written inode(x). > > > > > > > > > > If the inode(x) was written, f2fs_sync_file will do write the inode again with > > > > > fsync_mark. So, dnode(F) | inode(x) | inode(F) should be shown. > > > > > > > > > > In f2fs_sync_file, > > > > > ... > > > > > while (!sync_node_pages(sbi, ino, &wbc)) { > > > > > if (fsync_mark_done(sbi, ino)) > > > > > goto out; > > > > > mark_inode_dirty_sync(inode); > > > > > ret = f2fs_write_inode(inode, NULL); > > > > > if (ret) > > > > > goto out; > > > > > } > > > > > ... > > > > > > > > Is the following situation possible? > > > > > > > > f2fs_sync_file <writeback> > > > > sync_node_pages f2fs_write_node_pages > > > > write dnode(F) sync_node_pages > > > > write inode(x) /* clear PAGECACHE_TAG_DIRTY */ > > > > > > > > > > > > That is, f2fs_sync_file run parallel with <writeback>. The > > > > sync_node_pages above will return 1, because dnode(F) is written. > > > > inode(x) is written by <writeback> path. And because > > > > PAGECACHE_TAG_DIRTY is cleared, it is possible that sync_node_pages > > > > called by f2fs_sync_file does not write inode(F). > > > > > > I think Chao's comment would work. > > > How about this patch? > > > > > > From 32fe5ff49d2c78d3be4cf3638cc64ae71cf44549 Mon Sep 17 00:00:00 2001 > > > From: Jaegeuk Kim <jaegeuk@...nel.org> > > > Date: Wed, 10 Sep 2014 00:16:34 -0700 > > > Subject: [PATCH] f2fs: fix roll-forward missing scenarios > > > > > > We can summarize the roll forward recovery scenarios as follows. > > > > > > [Term] F: fsync_mark, D: dentry_mark > > > > > > 1. inode(x) | CP | inode(x) | dnode(F) > > > -> Update the latest inode(x). > > > > > > 2. inode(x) | CP | inode(F) | dnode(F) > > > -> No problem. > > > > > > 3. inode(x) | CP | dnode(F) | inode(x) > > > -> Recover to the latest dnode(F), and drop the last inode(x) > > > > > > 4. inode(x) | CP | dnode(F) | inode(F) > > > -> No problem. > > > > > > 5. CP | inode(x) | dnode(F) > > > -> The inode(DF) was missing. Should drop this dnode(F). > > > > > > 6. CP | inode(DF) | dnode(F) > > > -> No problem. > > > > > > 7. CP | dnode(F) | inode(DF) > > > -> If f2fs_iget fails, then goto next to find inode(DF). > > > > > > 8. CP | dnode(F) | inode(x) > > > -> If f2fs_iget fails, then goto next to find inode(DF). > > > But it will fail due to no inode(DF). > > > > > > So, this patch adds some missing points such as #1, #5, #7, and #8. > > > > > > Signed-off-by: Jaegeuk Kim <jaegeuk@...nel.org> > > > --- > > > fs/f2fs/file.c | 20 ++++++++++++---- > > > fs/f2fs/node.c | 11 ++++++++- > > > fs/f2fs/recovery.c | 70 +++++++++++++++++++++++++++++++++++++++++++++--------- > > > 3 files changed, 85 insertions(+), 16 deletions(-) > > > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > > > index e7681c3..70f5d4b 100644 > > > --- a/fs/f2fs/file.c > > > +++ b/fs/f2fs/file.c > > > @@ -206,15 +206,27 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) > > > up_write(&fi->i_sem); > > > } > > > } else { > > > - /* if there is no written node page, write its inode page */ > > > - while (!sync_node_pages(sbi, ino, &wbc)) { > > > - if (fsync_mark_done(sbi, ino)) > > > - goto out; > > > +sync_nodes: > > > + sync_node_pages(sbi, ino, &wbc); > > > + > > > + /* > > > + * inode(x) | CP | inode(x) | dnode(F) > > > + * -> ok > > > > Is it acceptable that we turn this to: > > > > inode(x) | CPU | inode (x) | dnode (F) | inode(F) > > > > > + * inode(x) | CP | dnode(F) | inode(x) > > > + * -> inode(x) | CP | dnode(F) | inode(x) | inode(F) > > > + * CP | inode(x) | dnode(F) > > > + * -> CP | inode(x) | dnode(F) | inode(DF) > > > + * CP | dnode(F) | inode(x) > > > + * -> CP | dnode(F) | inode(x) | inode(DF) > > > + */ > > > + if (!fsync_mark_done(sbi, ino)) { > > > mark_inode_dirty_sync(inode); > > > ret = f2fs_write_inode(inode, NULL); > > > if (ret) > > > goto out; > > > + goto sync_nodes; > > > } > > > + > > > ret = wait_on_node_pages_writeback(sbi, ino); > > > if (ret) > > > goto out; > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > > > index b32eb56..653aa71 100644 > > > --- a/fs/f2fs/node.c > > > +++ b/fs/f2fs/node.c > > > @@ -248,8 +248,17 @@ retry: > > > > > > /* update fsync_mark if its inode nat entry is still alive */ > > > e = __lookup_nat_cache(nm_i, ni->ino); > > > - if (e) > > > + if (e) { > > > + /* > > > + * CP | inode(x) | dnode(F) > > > + * -> CP | inode(x) | dnode(F) | inode(DF) > > > + */ > > > + if (!e->checkpointed && !e->fsync_done && > > > + ni->ino != ni->nid && fsync_done) > > > + goto skip; > > > e->fsync_done = fsync_done; > > > + } > > > +skip: > > > write_unlock(&nm_i->nat_tree_lock); > > > } > > > > I don't understand why we need so complex logic? Why not just let > > e->fsync_done reflect just latest is_fsync_dnode(page)? > > > > It appears that in f2fs_sync_file, what we need is just whether inode > > page has fsync mark or not. > > Oh, I see. The e->fsync_done indicates whether latest node of the inode > has fsync_mark. That can be used to deal with: > > CP | inode(DF) | dnode(x) > > But I still think it is possible to make e->fsync_done simple via > introduce another simple e->node_fsync_done to indicate whether the node > itself has fsync_mark. What do you think about the solution in the > below patch? Just build tested. > > IMHO, it is easier to be understand. And in effect, it is almost same, > because it is hard to distinguish between I think you are showing another complexity with the same functionality. As I mentioned before, this itself is a little bit corner case, so we need at least such kind of combined conditions. So, it doesn't make enough sense to add one more variable per each nat entry. Thanks, > > CP | inode(x) | dnode(F) > > and > > inode(x) | CP | inode(x) | dnode(F) > > in your solution too. > > Best Regards, > Huang, Ying > > -----------------------------------------------------> > --- > fs/f2fs/f2fs.h | 1 + > fs/f2fs/file.c | 26 ++++++++++++++++++++++---- > fs/f2fs/node.c | 17 ++++++++++++++++- > fs/f2fs/node.h | 4 +++- > fs/f2fs/recovery.c | 2 ++ > 5 files changed, 44 insertions(+), 6 deletions(-) > > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -207,10 +207,28 @@ int f2fs_sync_file(struct file *file, lo > up_write(&fi->i_sem); > } > } else { > - /* if there is no written node page, write its inode page */ > - while (!sync_node_pages(sbi, inode->i_ino, &wbc)) { > - if (fsync_mark_done(sbi, inode->i_ino)) > - goto out; > + for (;;) { > + sync_node_pages(sbi, inode->i_ino, &wbc); > + /* > + * inode(x) | CP | dnode(F) > + * -> ok > + * inode(x) | CP | inode(x) | dnode(x) > + * -> inode(x) | CP | inode(x) | dnode(x) | inode(F) > + * inode(x) | CP | inode(x) | dnode(F) > + * -> inode(x) | CP | inode(x) | dnode(F) | inode(F) > + * CP | inode(x) | dnode(x) > + * -> CP | inode(x) | dnode(x) | inode(DF) > + * CP | inode(x) | dnode(F) > + * -> CP | inode(x) | dnode(F) | inode(DF) > + * CP | dnode(F) | inode(x) > + * -> CP | dnode(F) | inode(x) | inode(DF) > + */ > + if ((is_checkpointed_node(sbi, inode->i_ino) && > + fsync_mark_done(sbi, inode->i_ino)) || > + (!is_checkpointed_node(sbi, inode->i_ino) && > + node_fsync_mark_done(sbi, inode->i_ino) && > + fsync_mark_done(sbi, inode->i_ino))) > + break; > mark_inode_dirty_sync(inode); > ret = f2fs_write_inode(inode, NULL); > if (ret) > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -137,6 +137,20 @@ int is_checkpointed_node(struct f2fs_sb_ > return is_cp; > } > > +bool node_fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid) > +{ > + struct f2fs_nm_info *nm_i = NM_I(sbi); > + struct nat_entry *e; > + bool node_fsync_done = false; > + > + read_lock(&nm_i->nat_tree_lock); > + e = __lookup_nat_cache(nm_i, nid); > + if (e) > + node_fsync_done = e->node_fsync_done; > + read_unlock(&nm_i->nat_tree_lock); > + return node_fsync_done; > +} > + > bool fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid) > { > struct f2fs_nm_info *nm_i = NM_I(sbi); > @@ -246,7 +260,8 @@ retry: > nat_set_blkaddr(e, new_blkaddr); > __set_nat_cache_dirty(nm_i, e); > > - /* update fsync_mark if its inode nat entry is still alive */ > + e->node_fsync_done = fsync_done; > + /* update fsync_done if its inode nat entry is still alive */ > e = __lookup_nat_cache(nm_i, ni->ino); > if (e) > e->fsync_done = fsync_done; > --- a/fs/f2fs/node.h > +++ b/fs/f2fs/node.h > @@ -42,7 +42,9 @@ struct node_info { > struct nat_entry { > struct list_head list; /* for clean or dirty nat list */ > bool checkpointed; /* whether it is checkpointed or not */ > - bool fsync_done; /* whether the latest node has fsync mark */ > + bool node_fsync_done; /* whether the node has fsync mark */ > + bool fsync_done; /* whether the latest node of the > + * inode has fsync mark */ > struct node_info ni; /* in-memory node information */ > }; > > --- a/fs/f2fs/recovery.c > +++ b/fs/f2fs/recovery.c > @@ -190,6 +190,8 @@ static int find_fsync_dnodes(struct f2fs > if (IS_ERR(entry->inode)) { > err = PTR_ERR(entry->inode); > kmem_cache_free(fsync_entry_slab, entry); > + if (err == -ENOENT) > + goto next; > break; > } > list_add_tail(&entry->list, head); > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -1220,6 +1220,7 @@ struct node_info; > > bool available_free_memory(struct f2fs_sb_info *, int); > int is_checkpointed_node(struct f2fs_sb_info *, nid_t); > +bool node_fsync_mark_done(struct f2fs_sb_info *, nid_t); > bool fsync_mark_done(struct f2fs_sb_info *, nid_t); > void fsync_mark_clear(struct f2fs_sb_info *, nid_t); > void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists