[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1166107946.5767.53.camel@lade.trondhjem.org>
Date: Thu, 14 Dec 2006 09:52:26 -0500
From: Trond Myklebust <trond.myklebust@....uio.no>
To: Andrew Morton <akpm@...l.org>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
linux-kernel <linux-kernel@...r.kernel.org>,
Nick Piggin <nickpiggin@...oo.com.au>
Subject: Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow
On Wed, 2006-12-13 at 17:41 -0800, Andrew Morton wrote:
> > Trond, please define precisely and completely and without reference to
> > the existing implementation: what behaviour does NFS want?
Part of the behaviour is dictated by our needs for
invalidate_inode_pages2(), part of the behaviour is dictated by the
2-stage NFS writeout mechanism.
Starting with invalidate_inode_pages2(). The goal here is to have a
function that is guaranteed to force all currently cached pages to be
read in from the server afresh. In other words, we'd like to throw out
the old mapping of the file, and start with a new one. However, we also
need to guarantee that any modifications that a user may have made to
the file are preserved. Throwing out a dirty page is forbidden unless
the data has been written to disk first.
What we're doing in try_to_release_page() in the current implementation
of invalidate_inode_pages2() is therefore to fix races: the goal is to
ensure that we write back any dirty data before the page is removed.
This is made necessary by the fact that the VM may at any time override
our call to unmap_mapping_range() and allow the page to be re-dirtied by
the user.
Next: about the 2-stage NFS writeout mechanism. A client may want to
allow the server to cache writeback data for a while, so that it can
flush several WRITE RPC calls to disk at the same time, maximising I/O
efficiency in terms of elevator algorithms etc.
It signals this to the server by means of the 'unstable' flag on the
WRITE RPC call, which allows the former to just write the data into its
pagecache. Before the client can actually free up the page it still
needs to know that the data it wrote to the server has been flushed from
the pagecache onto disk, and this is done by sending the COMMIT RPC
request. The latter acts rather like fdatasync() does on local data: it
guarantees that all file data within a specified range has been flushed
to disk. Failure of the COMMIT request signals to the client that the
server may have rebooted, and so the page needs to be written out again.
Our second use of try_to_release_page() is therefore to ensure that the
server has indeed flushed that dirty data from its pagecache to its disk
before we allow the VM to release the page on the client. We don't send
a COMMIT request for each and every page that is to be released, but we
do ensure that at least one COMMIT has been sent that covers the data
range represented by the page to be freed.
Cheers
Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists