[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1166577675.5768.19.camel@lade.trondhjem.org>
Date: Tue, 19 Dec 2006 20:21:15 -0500
From: Trond Myklebust <trond.myklebust@....uio.no>
To: Andrew Morton <akpm@...l.org>
Cc: Michal Sabala <lkml@...hbs.net>, linux-kernel@...r.kernel.org
Subject: Re: 2.6.18 mmap hangs unrelated apps
On Tue, 2006-12-19 at 19:17 -0500, Trond Myklebust wrote:
> Ack, I'll add one in. If PagePrivate() is set during the call to
> try_to_release_page(), then the page should never be freeable.
OK. This one actually compiles, and eliminates a few logic bugs. Note
that I renamed the callback to ->launder_page() for clarity (and for
histerical reasons).
Cheers
Trond
----------------------------------------------------------------
commit 85a5b844c56706a5e3f47cde8b82109d325ad609
Author: Trond Myklebust <Trond.Myklebust@...app.com>
Date: Tue Dec 19 20:18:55 2006 -0500
NFS: Fix race in nfs_release_page()
invalidate_inode_pages2() may find the dirty bit has been set on a page
owing to the fact that the page may still be mapped after it was locked.
Only after the call to unmap_mapping_range() are we sure that the page
can no longer be dirtied.
In order to fix this, NFS has hooked the releasepage() method and tries
to write the page out between the call to unmap_mapping_range() and the
call to remove_mapping(). This, however leads to deadlocks in the page
reclaim code, where the page may be locked without holding a reference
to the inode or dentry.
Fix is to add a new address_space_operation, launder_page(), which will
attempt to write out a dirty page without releasing the page lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@...app.com>
---
Documentation/filesystems/Locking | 8 ++++++++
fs/nfs/file.c | 16 ++++++++--------
include/linux/fs.h | 1 +
mm/truncate.c | 23 ++++++++++++++++++-----
4 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 790ef6f..28bfea7 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -171,6 +171,7 @@ prototypes:
int (*releasepage) (struct page *, int);
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
loff_t offset, unsigned long nr_segs);
+ int (*launder_page) (struct page *);
locking rules:
All except set_page_dirty may block
@@ -188,6 +189,7 @@ bmap: yes
invalidatepage: no yes
releasepage: no yes
direct_IO: no
+launder_page: no yes
->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
may be called from the request handler (/dev/loop).
@@ -281,6 +283,12 @@ buffers from the page in preparation for
indicate that the buffers are (or may be) freeable. If ->releasepage is zero,
the kernel assumes that the fs has no private interest in the buffers.
+ ->launder_page() may be called prior to releasing a page if
+it is still found to be dirty. It returns zero if the page was successfully
+cleaned, or an error value if not. Note that in order to prevent the page
+getting mapped back in and redirtied, it needs to be kept locked
+across the entire operation.
+
Note: currently almost all instances of address_space methods are
using BKL for internal serialization and that's one of the worst sources
of contention. Normally they are calling library functions (in fs/buffer.c)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 0dd6be3..fab20d0 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -315,14 +315,13 @@ static void nfs_invalidate_page(struct p
static int nfs_release_page(struct page *page, gfp_t gfp)
{
- /*
- * Avoid deadlock on nfs_wait_on_request().
- */
- if (!(gfp & __GFP_FS))
- return 0;
- /* Hack... Force nfs_wb_page() to write out the page */
- SetPageDirty(page);
- return !nfs_wb_page(page->mapping->host, page);
+ /* If PagePrivate() is set, then the page is not freeable */
+ return 0;
+}
+
+static int nfs_launder_page(struct page *page)
+{
+ return nfs_wb_page(page->mapping->host, page);
}
const struct address_space_operations nfs_file_aops = {
@@ -338,6 +337,7 @@ const struct address_space_operations nf
#ifdef CONFIG_NFS_DIRECTIO
.direct_IO = nfs_direct_IO,
#endif
+ .launder_page = nfs_launder_page,
};
static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 186da81..14a337c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -426,6 +426,7 @@ struct address_space_operations {
/* migrate the contents of a page to the specified target */
int (*migratepage) (struct address_space *,
struct page *, struct page *);
+ int (*launder_page) (struct page *);
};
struct backing_dev_info;
diff --git a/mm/truncate.c b/mm/truncate.c
index 9bfb8e8..d4811dc 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -321,6 +321,16 @@ failed:
return 0;
}
+static int
+do_launder_page(struct address_space *mapping, struct page *page)
+{
+ if (!PageDirty(page))
+ return 0;
+ if (page->mapping != mapping || mapping->a_ops->launder_page == NULL)
+ return 0;
+ return mapping->a_ops->launder_page(page);
+}
+
/**
* invalidate_inode_pages2_range - remove range of pages from an address_space
* @mapping: the address_space
@@ -386,11 +396,14 @@ int invalidate_inode_pages2_range(struct
PAGE_CACHE_SIZE, 0);
}
}
- was_dirty = test_clear_page_dirty(page);
- if (!invalidate_complete_page2(mapping, page)) {
- if (was_dirty)
- set_page_dirty(page);
- ret = -EIO;
+ ret = do_launder_page(mapping, page);
+ if (ret == 0) {
+ was_dirty = test_clear_page_dirty(page);
+ if (!invalidate_complete_page2(mapping, page)) {
+ if (was_dirty)
+ set_page_dirty(page);
+ ret = -EIO;
+ }
}
unlock_page(page);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists