[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1251306319.2290.6.camel@castor>
Date: Wed, 26 Aug 2009 18:05:19 +0100
From: Richard Kennedy <richard@....demon.co.uk>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: a.p.zijlstra@...llo.nl, akpm@...ux-foundation.org,
chris.mason@...cle.com, linux-kernel@...r.kernel.org,
jens.axboe@...cle.com
Subject: Re: [RFC PATCH] mm: balance_dirty_pages. reduce calls to
global_page_state to reduce cache references
On Tue, 2009-08-25 at 13:46 +0200, Miklos Szeredi wrote:
> On Fri, 21 Aug 2009, Peter Zijlstra wrote:
> > > I have tried to retain the existing behavior as much as possible, but
> > > have added NR_WRITEBACK_TEMP to nr_writeback. This counter was used in
> > > clip_bdi_dirty_limit but not in balance_dirty_pages, grep suggests this
> > > is only used by FUSE but I haven't done any testing on that. It does
> > > seem logical to count all the WRITEBACK pages when making the throttling
> > > decisions so this change should be more correct ;)
> >
> > Right, the NR_WRITEBACK_TEMP thing is a FUSE feature, its used in
> > writable mmap() support for FUSE things.
> >
> > I must admit to forgetting the exact semantics of the things, maybe
> > Miklos can remind us.
>
> I'll try: fuse is special because it needs writeback to be "very
> asynchronous". What I mean by this is that writing a dirty page may
> block indefinitely and that shouldn't hold up unrelated filesystem or
> memory operations.
>
> To satisfy this, fuse copies contents of dirty pages over to
> "temporary pages" and queues the write request with this temporary
> page, not the original page-cache page.
>
> This has two effects:
>
> - the page-cache page does not remain in "writeback" state but is
> cleaned immediately
>
> - the NR_WRITEBACK counter is not incremented for the duration of the
> writeback
>
> The first one is important because vmscan and page migration do
> wait_on_page_writeback() in some circumstances, which would block on
> fuse writebacks.
>
> The second one is important because vmscan will throttle writeout if
> the NR_WRITEBACK counter goes over the dirty threshold
> (throttle_vm_writeout). There were long discussions about this one,
> but in the end no one could surely tell how this works and why it is
> important. But NR_WRITEBACK_TEMP must not be counted there, otherwise
> the page scanning can deadlock with fuse filesystems.
>
> About balance_dirty_pages() I'm not quite sure. By a similar logic we
> don't want NR_WRITEBACK_TEMP pages to contribute to throttling
> unrelated filesytem writebacks. That might (through recursion via a
> userspace filesystem) lead to a deadlock.
>
> So my recommendation is that we should retain the old behavior.
>
> Thanks,
> Miklos
Thanks for the explanation.
The old behavior was to use NR_WRITEBACK_TEMP to clip the bdi
thresholds.
Should we continue to do this or just ignore NR_WRIEBACK_TEMP completely
in balance_dirty_pages ?
regards
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists