[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070206022549.GB31476@wotan.suse.de>
Date: Tue, 6 Feb 2007 03:25:49 +0100
From: Nick Piggin <npiggin@...e.de>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Linux Kernel <linux-kernel@...r.kernel.org>,
Linux Filesystems <linux-fsdevel@...r.kernel.org>,
Linux Memory Management <linux-mm@...ck.org>
Subject: Re: [patch 9/9] mm: fix pagecache write deadlocks
On Sun, Feb 04, 2007 at 10:36:20AM -0800, Andrew Morton wrote:
> On Sun, 4 Feb 2007 16:10:51 +0100 Nick Piggin <npiggin@...e.de> wrote:
>
> > They're not likely to hit the deadlocks, either. Probability gets more
> > likely after my patch to lock the page in the fault path. But practially,
> > we could live without that too, because the data corruption it fixes is
> > very rare as well. Which is exactly what we've been doing quite happily
> > for most of 2.6, including all distro kernels (I think).
>
> Thing is, an application which is relying on the contents of that page is
> already unreliable (or really peculiar), because it can get indeterminate
> results anyway.
Not necessarily -- they could read from one part of a page and write to
another. I see this as the biggest data corruption problem.
But even in the case where they can get indeterminate results, they can
still determine what the results *won't* be. Eg. they might use a single
byte for a flag or something.
> > ...
> >
> > On a P4 Xeon, SMP kernel, on a tmpfs filesystem, a 1GB dd if=/dev/zero write
> > had the following performance (higher is worse):
> >
> > Orig kernel New kernel
> > new file (no pagecache)
> > 4K blocks 1.280s 1.287s (+0.5%)
> > 64K blocks 1.090s 1.105s (+1.4%)
> > notrunc (uptodate pagecache)
> > 4K blocks 0.976s 1.001s (+0.5%)
> > 64K blocks 0.780s 0.792s (+1.5%)
> >
> > [numbers are better than +/- 0.005]
> >
> > So we lose somewhere between half and one and a half of one percent
> > performance in a pagecache write intensive workload.
>
> That's not too bad - caches are fast. Did you look at optimising the
> handling of that temp page, ensure that we always use the same page? I
> guess the page allocator per-cpu-pages thing is being good here.
Yeah it should be doing a reasonable job.
> I'm not sure how, though. Park a copy in the task_struct, just as an
> experiment. But that'd de-optimise multiple-tasks-writing-on-the-same-cpu.
> Maybe a per-cpu thing? Largely duplicates the page allocator's per-cpu-pages.
Putting a copy in the task_struct won't do much I figure, except saving
a copule of interrupt enable/disable, and being more wasteful of memory
and cache-hotness.
Per-cpu doesn't work because we can't hold preempt off over the usercopy
(well, we *could* do it in a loop together with fault_in_pages, but that
just adds to the icache bloat).
>
> Of course, we're also increasing caceh footprint, which this test won't
> show.
We are indeed. At least we release the hot page back to the allocator
very quickly that it can be reused.
The upshot is that your writev performance will be improved :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists