linux-kernel - Re: [PATCH] btrfs: lower metadata writeback threshold on low dirty threshold

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120503100249.GA18819@localhost>
Date:	Thu, 3 May 2012 18:02:49 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Chris Mason <chris.mason@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jeff Moyer <jmoyer@...hat.com>, Jens Axboe <axboe@...nel.dk>,
	linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	Dave Chinner <david@...morbit.com>,
	Christoph Hellwig <hch@...radead.org>,
	Shaohua Li <shli@...ionio.com>
Subject: Re: [PATCH] btrfs: lower metadata writeback threshold on low dirty
 threshold

On Thu, May 03, 2012 at 11:25:28AM +0200, Jan Kara wrote:
> On Thu 03-05-12 11:43:11, Wu Fengguang wrote:
> > This helps write performance when setting the dirty threshold to tiny numbers.
> > 
> >      3.4.0-rc2         3.4.0-rc2-btrfs4+
> >   ------------  ------------------------
> >          96.92        -0.4%        96.54  bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
> >          98.47        +0.0%        98.50  bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
> >          99.38        -0.3%        99.06  bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
> >          98.04        -0.0%        98.02  bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
> >          98.68        +0.3%        98.98  bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
> >          99.34        -0.0%        99.31  bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
> >   ==>    88.98        +9.6%        97.53  bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
> >   ==>    86.99       +13.1%        98.39  bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
> >   ==>     2.75     +2442.4%        69.88  bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
> >   ==>     3.31     +2634.1%        90.54  bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
> > 
> > Signed-off-by: Fengguang Wu <fengguang.wu@...el.com>
> > ---
> >  fs/btrfs/disk-io.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > --- linux-next.orig/fs/btrfs/disk-io.c	2012-05-02 14:04:00.989262395 +0800
> > +++ linux-next/fs/btrfs/disk-io.c	2012-05-02 14:04:01.773262414 +0800
> > @@ -930,7 +930,8 @@ static int btree_writepages(struct addre
> >  
> >  		/* this is a bit racy, but that's ok */
> >  		num_dirty = root->fs_info->dirty_metadata_bytes;
> > -		if (num_dirty < thresh)
> > +		if (num_dirty < min(thresh,
> > +				    global_dirty_limit << (PAGE_CACHE_SHIFT-2)))
> >  			return 0;
> >  	}
> >  	return btree_write_cache_pages(mapping, wbc);
>   Frankly, that whole condition on WB_SYNC_NONE in btree_writepages() looks
> like a hack. I think we also had problems with this condition when we tried
> to change b_more_io list handling. I found rather terse commit message
> explaining the code:
> Btrfs: Limit btree writeback to prevent seeks
> 
>   Which I kind of understand but is it that bad? Also I think last time we
> stumbled over this code we were discussing that these dirty metadata would
> be simply hidden from mm which would solve the problem of flusher thread
> trying to outsmart the filesystem... But I guess noone had time to
> implement this for btrfs.

Yeah I have the same uneasy feelings. Actually my first attempt was to
remove the heuristics in btree_writepages() altogether. The result is
more or less performance degradations in the normal cases:

wfg@bee /export/writeback% ./compare bay/*/*-{3.4.0-rc2,3.4.0-rc2-btrfs+} 
               3.4.0-rc2          3.4.0-rc2-btrfs+  
------------------------  ------------------------  
                  190.81        -6.8%       177.82  bay/JBOD-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
                  195.86        -3.3%       189.31  bay/JBOD-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
                  196.68        -1.7%       193.30  bay/JBOD-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
                  194.83       -24.4%       147.27  bay/JBOD-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
                  196.60        -2.5%       191.61  bay/JBOD-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
                  197.09        -0.7%       195.69  bay/JBOD-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
                  181.64        -8.7%       165.80  bay/RAID0-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
                  186.14        -2.8%       180.85  bay/RAID0-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
                  191.10        -1.5%       188.23  bay/RAID0-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
                  191.30       -20.7%       151.63  bay/RAID0-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
                  186.03        -2.4%       181.54  bay/RAID0-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
                  170.18        -2.5%       165.97  bay/RAID0-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
                   96.18        -1.9%        94.32  bay/RAID1-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
                   97.71        -1.4%        96.36  bay/RAID1-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
                   97.57        -0.4%        97.23  bay/RAID1-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
                   97.68        -6.0%        91.79  bay/RAID1-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
                   97.76        -0.7%        97.07  bay/RAID1-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
                   97.53        -0.3%        97.19  bay/RAID1-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
                   96.92        -3.0%        94.03  bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
                   98.47        -1.4%        97.08  bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
                   99.38        -0.7%        98.66  bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
                   98.04        -8.2%        89.99  bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
                   98.68        -0.6%        98.09  bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
                   99.34        -0.7%        98.62  bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
                   88.98        -0.5%        88.51  bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
                   86.99       +14.5%        99.60  bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
                    2.75     +1871.2%        54.18  bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
                    3.31     +2035.0%        70.70  bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
                 3635.55        -1.2%      3592.46  TOTAL write_bw

So I end up with the conservative fix in this patch.

FYI I also experimented with "global_dirty_limit << PAGE_CACHE_SHIFT"
w/o the further "/4" in this patch, however result is not good:

               3.4.0-rc2         3.4.0-rc2-btrfs3+
------------------------  ------------------------
                   96.92        -0.3%        96.62  bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
                   98.47        +0.1%        98.56  bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
                   99.38        -0.2%        99.23  bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
                   98.04        +0.1%        98.15  bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
                   98.68        +0.3%        98.96  bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
                   99.34        -0.1%        99.20  bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
                   88.98        -0.3%        88.73  bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
                   86.99        +1.4%        88.23  bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
                    2.75      +232.0%         9.13  bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
                    3.31        +1.5%         3.36  bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2

So this patch is kind of based on "experiment" rather than "reasoning".
And I took the easy way of using the global dirty threshold. Ideally
it should be based upon the per-bdi dirty threshold, but anyway...

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/