linux-kernel - Re: [PATCH] btrfs: lower metadata writeback threshold on low dirty threshold

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 3 May 2012 09:30:11 -0400
From:	Josef Bacik <josef@...hat.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Fengguang Wu <fengguang.wu@...el.com>,
	Chris Mason <chris.mason@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jeff Moyer <jmoyer@...hat.com>, Jens Axboe <axboe@...nel.dk>,
	linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	Dave Chinner <david@...morbit.com>,
	Christoph Hellwig <hch@...radead.org>,
	Shaohua Li <shli@...ionio.com>
Subject: Re: [PATCH] btrfs: lower metadata writeback threshold on low dirty
 threshold

On Thu, May 03, 2012 at 11:25:28AM +0200, Jan Kara wrote:
> On Thu 03-05-12 11:43:11, Wu Fengguang wrote:
> > This helps write performance when setting the dirty threshold to tiny numbers.
> > 
> >      3.4.0-rc2         3.4.0-rc2-btrfs4+
> >   ------------  ------------------------
> >          96.92        -0.4%        96.54  bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
> >          98.47        +0.0%        98.50  bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
> >          99.38        -0.3%        99.06  bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
> >          98.04        -0.0%        98.02  bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
> >          98.68        +0.3%        98.98  bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
> >          99.34        -0.0%        99.31  bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
> >   ==>    88.98        +9.6%        97.53  bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
> >   ==>    86.99       +13.1%        98.39  bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
> >   ==>     2.75     +2442.4%        69.88  bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
> >   ==>     3.31     +2634.1%        90.54  bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
> > 
> > Signed-off-by: Fengguang Wu <fengguang.wu@...el.com>
> > ---
> >  fs/btrfs/disk-io.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > --- linux-next.orig/fs/btrfs/disk-io.c	2012-05-02 14:04:00.989262395 +0800
> > +++ linux-next/fs/btrfs/disk-io.c	2012-05-02 14:04:01.773262414 +0800
> > @@ -930,7 +930,8 @@ static int btree_writepages(struct addre
> >  
> >  		/* this is a bit racy, but that's ok */
> >  		num_dirty = root->fs_info->dirty_metadata_bytes;
> > -		if (num_dirty < thresh)
> > +		if (num_dirty < min(thresh,
> > +				    global_dirty_limit << (PAGE_CACHE_SHIFT-2)))
> >  			return 0;
> >  	}
> >  	return btree_write_cache_pages(mapping, wbc);
>   Frankly, that whole condition on WB_SYNC_NONE in btree_writepages() looks
> like a hack. I think we also had problems with this condition when we tried
> to change b_more_io list handling. I found rather terse commit message
> explaining the code:
> Btrfs: Limit btree writeback to prevent seeks
> 
>   Which I kind of understand but is it that bad? Also I think last time we
> stumbled over this code we were discussing that these dirty metadata would
> be simply hidden from mm which would solve the problem of flusher thread
> trying to outsmart the filesystem... But I guess noone had time to
> implement this for btrfs.
> 

Actually I did but I ran into an OOM problem.  See we can have as much dirty
metadata as we have ram, and having no insight into what the global dirty and
writeback limits are for the system means btrfs was using wayyyyy more memory
for it's dirty and writeback metadata pages than would have normally been
allowed.  In order to avoid OOM I had to re-implement a sort of
balance_dirty_pages for btrfs, and again having no access to the global dirty
limits and such at the time (AFAIK, I could just be an idiot) it was very hacky
and prone to breaking.  The shrinker doesn't get called enough to handle this
sort of thing.  Dave mentioned at LSF that XFS will actually do the synchronous
writeout from the shrinker which will auto-throttle everything so I was going to
try that but I haven't gotten around to it.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/