[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20061010221304.6bef249f.akpm@osdl.org>
Date: Tue, 10 Oct 2006 22:13:04 -0700
From: Andrew Morton <akpm@...l.org>
To: Nick Piggin <npiggin@...e.de>
Cc: Linux Memory Management <linux-mm@...ck.org>,
Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [patch 2/5] mm: fault vs invalidate/truncate race fix
On Tue, 10 Oct 2006 16:21:49 +0200 (CEST)
Nick Piggin <npiggin@...e.de> wrote:
> --- linux-2.6.orig/mm/filemap.c
> +++ linux-2.6/mm/filemap.c
> @@ -1392,9 +1392,10 @@ struct page *filemap_nopage(struct vm_ar
> unsigned long size, pgoff;
> int did_readaround = 0, majmin = VM_FAULT_MINOR;
>
> + BUG_ON(!(area->vm_flags & VM_CAN_INVALIDATE));
> +
> pgoff = ((address-area->vm_start) >> PAGE_CACHE_SHIFT) + area->vm_pgoff;
>
> -retry_all:
> size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> if (pgoff >= size)
> goto outside_data_content;
> @@ -1416,7 +1417,7 @@ retry_all:
> * Do we have something in the page cache already?
> */
> retry_find:
> - page = find_get_page(mapping, pgoff);
> + page = find_lock_page(mapping, pgoff);
Here's a little problem. Locking the page in the pagefault handler takes
our deadlock while writing from a mmapped copy of the page into the same
page from "extremely hard to hit" to "super-easy to hit". Try running
write-deadlock-demo.c from
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz
It conveniently deadlocks while holding mmap_sem, so `ps' get stuck too.
So this whole idea of locking the page in the fault handler is off the
table until we fix that deadlock for real. Coincidentally I started coding
a fix for that a couple of weeks ago, but spend too much time with my nose
in other people's crap to get around to writing my own crap.
The basic idea is
- revert the recent changes to the core write() code (the ones which
killed writev() performance, especially on NFS overwrites).
- clean some stuff up
- modify the core of write() so that instead of doing copy_from_user(),
we do inc_preempt_count();copy_from_user_inatomic(). So we never enter
the pagefault handler while holding the lock on the pagecache page.
If the fault happens, we run commit_write() on however much stuff we
managed to copy and then go back and try to fault the target page back in
again. Repeat for ten times then give up.
It gets tricky because it means that we'll need to go back to zeroing
out the uncopied part of the pagecache page before
commit_write+unlock_page(). This will resurrect the recently-fixed
problem where userspace can fleetingly see a bunch of zeroes in pagecache
where it expected to see either the old data or the new data.
But I don't think that problem was terribly serious, and we can improve
the situation quite a lot by not doing that zeroing if the page is
already up-to-date.
Anyway, if you're feeling up to it I'll document the patches I have and hand
them over - they're not making much progress here.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists