[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20100803150759.GA786@localhost>
Date: Tue, 3 Aug 2010 23:07:59 +0800
From: Wu Fengguang <fengguang.wu@...el.com>
To: Jan Kara <jack@...e.cz>
Cc: Trond Myklebust <Trond.Myklebust@...app.com>,
Christoph Hellwig <hch@...radead.org>,
Mel Gorman <mel@....ul.ie>,
Andrew Morton <akpm@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Dave Chinner <david@...morbit.com>,
Chris Mason <chris.mason@...cle.com>,
Nick Piggin <npiggin@...e.de>, Rik van Riel <riel@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH 0/9] Reduce writeback from page reclaim context V5
Sorry, forgot the attachment :)
Thanks,
Fengguang
On Tue, Aug 03, 2010 at 11:04:46PM +0800, Wu Fengguang wrote:
> On Tue, Aug 03, 2010 at 08:52:49PM +0800, Jan Kara wrote:
> > On Tue 03-08-10 15:34:49, Wu Fengguang wrote:
> > > On Thu, Jul 29, 2010 at 04:45:23PM +0800, Christoph Hellwig wrote:
> > > > Btw, I'm very happy with all this writeback related progress we've made
> > > > for the 2.6.36 cycle. The only major thing that's really missing, and
> > > > which should help dramatically with the I/O patters is stopping direct
> > > > writeback from balance_dirty_pages(). I've seen patches frrom Wu and
> > > > and Jan for this and lots of discussion. If we get either variant in
> > > > this should be once of the best VM release from the filesystem point of
> > > > view.
> > >
> > > Sorry for the delay. But I'm not feeling good about the current
> > > patches, both mine and Jan's.
> > >
> > > Accounting overheads/accuracy are the obvious problem. Both patches do
> > > not perform well on large NUMA machines and fast storage. They are found
> > > hard to improve in previous discussions.
> > Yes, my patch for balance_dirty_pages() has a problem with percpu counter
> > (im)precision and resorting to pure atomic type could result in bouncing
> > of the cache line among CPUs completing the IO (at least that is the reason
> > why all other BDI stats are per-cpu I believe).
> > We could solve the problem by doing the accounting on page IO submission
> > time (there using the atomic type should be fine as we mostly submit IO
> > from the flusher thread anyway). It's just that doing the accounting on
> > completion time has the nice property that we really hold the throttled
> > thread upto the moment when vm can really reuse the pages.
>
> Could try this and check how it works with NFS. The attached patch
> will also be necessary for the test. It implements a writeback wait
> queue for NFS, without it all dirty pages may be put to writeback.
>
> I suspect the resulting fluctuations will be the same. Because
> balance_dirty_pages() will wait on some background writeback (as you
> proposed), which will block on the NFS writeback queue, which in turn
> wait for the completion of COMMIT RPCs (the current patches directly
> wait here). On the completion of one COMMIT, lots of pages may be
> freed in a burst, which makes the whole stack progress very bumpy.
>
> > > We might do dirty throttling based on throughput, ignoring the
> > > writeback completions totally. The basic idea is, for current process,
> > > we already have a per-bdi-and-task threshold B as the local throttle
> > Do we? The limit is currently just per-bdi, isn't it? Or do you mean
>
> bdi_dirty_limit() calls task_dirty_limit(), so it's also related to
> the current task. For convenience we called it per-bdi writeback :)
>
> > the ratelimiting - i.e. how often do we call balance_dirty_pages()?
> > That is per-cpu if I'm right.
> > > target. When dirty pages go beyond B*80% for example, we start
> > > throttling the task's writeback throughput. The more closer to B, the
> > > lower throughput. When reaches B or global threshold, we completely
> > > stop it. The hope is, the throughput will be sustained at some balance
> > > point. This will need careful calculation to perform stable/robust.
> > But what do you exactly mean by throttling the task in your scenario?
> > What would it wait on?
>
> It will simply wait for eg. 10ms for every N pages written. The more
> closer to B, the less N will be.
>
> Thanks,
> Fengguang
>
> > > In this way, the throttle can be made very smooth. My old experiments
> > > show that the current writeback completion based throttling fluctuates
> > > a lot for the stall time. In particular it makes bumpy writeback for
> > > NFS, so that some times the network pipe is not active at all and
> > > performance is impacted noticeably.
> > >
> > > By the way, we'll harvest a writeback IO controller :)
> >
> > Honza
> > --
> > Jan Kara <jack@...e.cz>
> > SUSE Labs, CR
View attachment "writeback-nfs-request-queue.patch" of type "text/x-diff" (10896 bytes)
Powered by blists - more mailing lists