linux-kernel - RE: [f2fs-dev] [PATCH] f2fs: merge meta writes as many possible

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <COL402-EAS16204160218AFFC2BABD01AAB360@phx.gbl>
Date:	Wed, 7 Oct 2015 20:31:54 +0800
From:	Chao Yu <yuchaochina@...mail.com>
To:	"'Jaegeuk Kim'" <jaegeuk@...nel.org>
CC:	<linux-kernel@...r.kernel.org>, <linux-fsdevel@...r.kernel.org>,
	<linux-f2fs-devel@...ts.sourceforge.net>
Subject: RE: [f2fs-dev] [PATCH] f2fs: merge meta writes as many possible

Hi Jaegeuk,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@...nel.org]
> Sent: Wednesday, October 07, 2015 7:22 AM
> To: Chao Yu
> Cc: linux-kernel@...r.kernel.org; linux-fsdevel@...r.kernel.org;
> linux-f2fs-devel@...ts.sourceforge.net
> Subject: Re: [f2fs-dev] [PATCH] f2fs: merge meta writes as many possible
> 
> Hi Chao,
> 
> On Tue, Oct 06, 2015 at 10:54:12PM +0800, Chao Yu wrote:
> > Hi Jaegeuk,
> >
> > > -----Original Message-----
> > > From: Jaegeuk Kim [mailto:jaegeuk@...nel.org]
> > > Sent: Saturday, October 03, 2015 12:48 AM
> > > To: linux-kernel@...r.kernel.org; linux-fsdevel@...r.kernel.org;
> > > linux-f2fs-devel@...ts.sourceforge.net; linux-kernel@...r.kernel.org;
> > > linux-fsdevel@...r.kernel.org; linux-f2fs-devel@...ts.sourceforge.net
> > > Cc: Jaegeuk Kim; Jaegeuk Kim
> > > Subject: [f2fs-dev] [PATCH] f2fs: merge meta writes as many possible
> > >
> > > This patch tries to merge IOs as many as possible when background flusher
> > > conducts flushing the dirty meta pages.
> > >
> > > [Before]
> > >
> > > ...
> > > 2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124320, size = 4096
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124560, size = 32768
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95720, size = 987136
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123928, size = 4096
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123944, size = 8192
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123968, size = 45056
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124064, size = 4096
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 97648, size = 1007616
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123776, size = 8192
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123800, size = 32768
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124624, size = 4096
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 99616, size = 921600
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123608, size = 4096
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123624, size = 77824
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123792, size = 4096
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123864, size = 32768
> > > ...
> > >
> > > [After]
> > >
> > > ...
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 92168, size = 892928
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 93912, size = 753664
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95384, size = 716800
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 96784, size = 712704
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104160, size = 364544
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104872, size = 356352
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 105568, size = 278528
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106112, size = 319488
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106736, size = 258048
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107240, size = 270336
> > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107768, size = 180224
> > > ...
> > >
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@...nel.org>
> > > ---
> > >  fs/f2fs/checkpoint.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > > index ff53405..da5ee7e 100644
> > > --- a/fs/f2fs/checkpoint.c
> > > +++ b/fs/f2fs/checkpoint.c
> > > @@ -257,7 +257,7 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
> > >  						long nr_to_write)
> > >  {
> > >  	struct address_space *mapping = META_MAPPING(sbi);
> > > -	pgoff_t index = 0, end = LONG_MAX;
> > > +	pgoff_t index = 0, end = LONG_MAX, prev = LONG_MAX;
> > >  	struct pagevec pvec;
> > >  	long nwritten = 0;
> > >  	struct writeback_control wbc = {
> > > @@ -277,6 +277,11 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
> > >  		for (i = 0; i < nr_pages; i++) {
> > >  			struct page *page = pvec.pages[i];
> > >
> > > +			if (prev == LONG_MAX)
> > > +				prev = page->index - 1;
> > > +			if (nr_to_write != LONG_MAX && page->index != prev + 1)

If we reach this condition, should we break the 'while' loop outside?
Because it's not possible to meet the page with 'prev + 1' index in any
page vec found afterward.

> >
> > Does this mean we only writeback consecutive meta page of SSA region? If these meta
> > pages are updated randomly (in collapse range or insert range case), we will writeback
> > very few meta pages in one round of flush, it may cause low performance since FTL will
> > do the force GC on our meta page update region if we writeback meta pages in one segment
> > repeatly.
> 
> I don't think this would degrade the performance pretty much. Rather than that,
> as the above traces, I could see more possitive impact on IO patterns when I
> gave some delay on flushing meta pages. (I got the traces when copying kernel
> source tree.) Moreover, the amount of the meta page writes is relatively lower
> than other data types.

Previously I only track collapse/insert case, so I'm just worry about those corner
cases. Today I track the normal case, the trace info looks good to me! :)

> 
> > IMO, when the distribution of dirty SSA pages are random, at least we should writeback
> > all dirty meta pages in SSA region which align to our segment size under block plug.
> >
> > One more thing is that I found in xfstest tests/generic/019 always cause inconsistent
> > between ssa info and meta info of inode (i_blocks != total_blk_cnt). The reason of the
> > problem is in ->falloc::collapse, we will update ssa meta page and node page info
> > together, however, unfortunately ssa meta page is writeback by kworker, node page will
> > no longer be persisted since fail_make_request made f2fs flagged as CP_ERROR_FLAG.
> >
> > So writeback SSA pages leads the potential risk of producing consistent issue. I think
> > there are some ways can fix this:
> > 1) don't writeback SSA pages in kworker, but side-effect is more memory will be cost
> > since cp is executed, of course, periodical cp can fix the memory cost issue.
> > 2) add some tag in somewhere, we can recover the ssa info with dnode page, but this is
> > really a big change.
> >
> > How do you think? Any better idea?
> 
> Good catch!
> I think we can allocate a new block address for this case like the below patch.
> 
> From 4979a1cbd7c718cff946cb421fbc0763a3dbd0df Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@...nel.org>
> Date: Tue, 6 Oct 2015 15:41:58 -0700
> Subject: [PATCH] f2fs: avoid SSA corruption when replacing block addresses
> 
> If block addresses are pointed by the previous checkpoint, we should not update
> the SSA when replacing block addresses by f2fs_collapse_range.
> 
> This patch allocates a new block address if there is an exising block address.

This patch doesn't fix that issue, because SSA block related to new_blkaddr will
be overwritten, not the one related to old_blkaddr.
So still I can reproduce the issue after applying this patch.

Thanks,

> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@...nel.org>
> ---
>  fs/f2fs/segment.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 1d86a35..3c546eb 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1391,8 +1391,20 @@ void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data
> *dn,
>  {
>  	struct f2fs_summary sum;
> 
> +	/*
> +	 * we should not use the existing address to avoid SSA
> +	 * corruption.
> +	 */
>  	set_summary(&sum, dn->nid, dn->ofs_in_node, version);
> 
> +	if (recover_curseg && old_addr != NULL_ADDR && old_addr != NEW_ADDR) {
> +		allocate_data_block(sbi, NULL, dn->data_blkaddr,
> +				&dn->data_blkaddr, &sum, CURSEG_WARM_DATA);
> +		set_data_blkaddr(dn);
> +		f2fs_update_extent_cache(dn);
> +		old_addr = dn->data_blkaddr;
> +	}
> +
>  	__f2fs_replace_block(sbi, &sum, old_addr, new_addr, recover_curseg);
> 
>  	dn->data_blkaddr = new_addr;
> --
> 2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/