[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20070315131305.GF8321@wotan.suse.de>
Date: Thu, 15 Mar 2007 14:13:05 +0100
From: Nick Piggin <npiggin@...e.de>
To: Jens Axboe <jens.axboe@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [patch 1/2] splice: dont steal
On Thu, Mar 15, 2007 at 01:54:32PM +0100, Jens Axboe wrote:
> On Thu, Mar 15 2007, Nick Piggin wrote:
> > On Thu, Mar 15, 2007 at 01:27:23PM +0100, Jens Axboe wrote:
> > > On Thu, Mar 15 2007, Nick Piggin wrote:
> > > >
> > > > We should be able to allow for it with the new a_ops API I'm working
> > > > on.
> > >
> > > "Should be" and in progress stuff, is it guarenteed to get there?
> >
> > Well considering that it is needed in order to solve 3 different deadlock
> > scenarios in the core write(2) path without taking a big performance hit,
> > I'd hope so ;)
> >
> > It isn't guaranteed, but I have only had positive feedback so far. Would
> > take a while to actually get merged, though.
>
> It's not that I don't believe you, I'm just a little reluctant to rip
> stuff out with a promise to fix it later when foo and bar are merged,
> since things like that have a tendency not to get done because they are
> forgotten :-)
Fair enough. The API side is trivial, all I need to do is set a single
flag and make splice pass down the page, and set that flag when stealing.
Filesystems might vary from trivial to impossible, but I think most should
be OK. If the flag is there then they at least have the option.
> Do you have a test case for stealing failures? What I'm really asking is
> how critical is this?
I guess you could fill a filesystem completely, and have a sparse file
in it. Then steal a page and splice it in. The prepare_write should fail,
but the page will still be in pagecache, until it gets reclaimed, then
it will go back to zeroes.
(no I don't have a test case ;)).
You could do something like remove the page if prepare_write fails, but
there is still a window where a read can see it. Basically I can't see
a way that it can possibly work within our current prepare_write API,
and it is a data corruption bug, so in my opinion it is a candidate for
2.6.21 + stable.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists