lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20061003110348.c3ce6337.akpm@osdl.org>
Date:	Tue, 3 Oct 2006 11:03:48 -0700
From:	Andrew Morton <akpm@...l.org>
To:	Chip Coldwell <coldwell@...hat.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: direct IO regression in 2.6.18

On Tue, 3 Oct 2006 13:51:49 -0400 (EDT)
Chip Coldwell <coldwell@...hat.com> wrote:

> Hi,
> 
> The following commit caused a regression in direct IO:
> 
> commit 016eb4a0ed06a3677d67a584da901f0e9a63c666
> Author: Andrew Morton <akpm@...l.org>
> Date:   Fri Sep 8 09:48:38 2006 -0700
> 
>      [PATCH] invalidate_complete_page() race fix
> 
> ...
> 
> The issue is that invalidate_complete_page is in two different code
> paths, the normal one (prune_icache calls invalidate_inode_pages calls
> invalidate_mapping_pages calls invalidate_complete_page) and also via
> direct IO (generic_file_direct_write calls generic_file_direct_IO
> calls invalidate_inode_pages2_range calls invalidate_complete_page).
> In the latter case, the page is (usually) not even in the page cache
> and the refcount can legitimately != 2.


The below was merged earlier this week and is tagged for 2.6.18.x.
You just falsified my changelog ;)



From: Andrew Morton <akpm@...l.org>

The recent fix to invalidate_inode_pages() (git commit 016eb4a) managed to
unfix invalidate_inode_pages2().

The problem is that various bits of code in the kernel can take transient refs
on pages: the page scanner will do this when inspecting a batch of pages, and
the lru_cache_add() batching pagevecs also hold a ref.

Net result is transient failures in invalidate_inode_pages2().  This affects
NFS directory invalidation (observed) and presumably also block-backed
direct-io (not yet reported).

Fix it by reverting invalidate_inode_pages2() back to the old version which
ignores the page refcounts.

We may come up with something more clever later, but for now we need a 2.6.18
fix for NFS.

Cc: Chuck Lever <cel@...i.umich.edu>
Cc: Nick Piggin <nickpiggin@...oo.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc: <stable@...nel.org>
Signed-off-by: Andrew Morton <akpm@...l.org>
---

 mm/truncate.c |   34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff -puN mm/truncate.c~invalidate_inode_pages2-ignore-page-refcounts mm/truncate.c
--- a/mm/truncate.c~invalidate_inode_pages2-ignore-page-refcounts
+++ a/mm/truncate.c
@@ -287,9 +287,39 @@ unsigned long invalidate_inode_pages(str
 {
 	return invalidate_mapping_pages(mapping, 0, ~0UL);
 }
-
 EXPORT_SYMBOL(invalidate_inode_pages);
 
+/*
+ * This is like invalidate_complete_page(), except it ignores the page's
+ * refcount.  We do this because invalidate_inode_pages2() needs stronger
+ * invalidation guarantees, and cannot afford to leave pages behind because
+ * shrink_list() has a temp ref on them, or because they're transiently sitting
+ * in the lru_cache_add() pagevecs.
+ */
+static int
+invalidate_complete_page2(struct address_space *mapping, struct page *page)
+{
+	if (page->mapping != mapping)
+		return 0;
+
+	if (PagePrivate(page) && !try_to_release_page(page, 0))
+		return 0;
+
+	write_lock_irq(&mapping->tree_lock);
+	if (PageDirty(page))
+		goto failed;
+
+	BUG_ON(PagePrivate(page));
+	__remove_from_page_cache(page);
+	write_unlock_irq(&mapping->tree_lock);
+	ClearPageUptodate(page);
+	page_cache_release(page);	/* pagecache ref */
+	return 1;
+failed:
+	write_unlock_irq(&mapping->tree_lock);
+	return 0;
+}
+
 /**
  * invalidate_inode_pages2_range - remove range of pages from an address_space
  * @mapping: the address_space
@@ -356,7 +386,7 @@ int invalidate_inode_pages2_range(struct
 				}
 			}
 			was_dirty = test_clear_page_dirty(page);
-			if (!invalidate_complete_page(mapping, page)) {
+			if (!invalidate_complete_page2(mapping, page)) {
 				if (was_dirty)
 					set_page_dirty(page);
 				ret = -EIO;
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ