linux-kernel - Re: [PATCH] bdi_sync_writeback should WB_SYNC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090928133219.GA6405@think>
Date:	Mon, 28 Sep 2009 09:32:19 -0400
From:	Chris Mason <chris.mason@...cle.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org, jens.axboe@...cle.com, jack@...e.cz
Subject: Re: [PATCH] bdi_sync_writeback should WB_SYNC_NONE first

On Sun, Sep 27, 2009 at 01:34:58AM -0700, Andrew Morton wrote:
> On Fri, 25 Sep 2009 10:10:14 -0400 Chris Mason <chris.mason@...cle.com> wrote:
> 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 8e1e5e1..27f8e0e 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -225,7 +225,7 @@ static void bdi_sync_writeback(struct backing_dev_info *bdi,
> >  {
> >  	struct wb_writeback_args args = {
> >  		.sb		= sb,
> > -		.sync_mode	= WB_SYNC_ALL,
> > +		.sync_mode	= WB_SYNC_NONE,
> >  		.nr_pages	= LONG_MAX,
> >  		.range_cyclic	= 0,
> >  	};
> > @@ -236,6 +236,13 @@ static void bdi_sync_writeback(struct backing_dev_info *bdi,
> >  
> >  	bdi_queue_work(bdi, &work);
> >  	bdi_wait_on_work_clear(&work);
> > +
> > +	args.sync_mode = WB_SYNC_ALL;
> > +	args.nr_pages = LONG_MAX;
> > +
> > +	work.state = WS_USED | WS_ONSTACK;
> > +	bdi_queue_work(bdi, &work);
> > +	bdi_wait_on_work_clear(&work);
> >  }
> 
> Those LONG_MAX's are a worry.  What prevents a very long
> almost-livelock from occurring if userspace is concurrently dirtying
> pagecache at a high rate?
> 

In this case, we should be called from unmount.  But, Jens tells me my
patch isn't quite right because even without my patch, the WB_SYNC_ALL
run is queued onto the tail of the list after the WB_SYNC_NONE run.

So, I need to trace it a little better.   My initial theory was that the
nr_dirty number done by the first WB_SYNC_NONE run wasn't big enough.
Once btrfs writepage kicks in, it can make more dirty metadata pages to
close out the delalloc, and if those get written first things could exit
before all the data pages are on disk.

Its a theory anyway, I'll dig in more.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/