[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070726013728.GB20727@wotan.suse.de>
Date: Thu, 26 Jul 2007 03:37:28 +0200
From: Nick Piggin <npiggin@...e.de>
To: Chris Mason <chris.mason@...cle.com>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Trond Myklebust <trond.myklebust@....uio.no>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH RFC] extent mapped page cache
On Wed, Jul 25, 2007 at 08:18:53AM -0400, Chris Mason wrote:
> On Wed, 25 Jul 2007 04:32:17 +0200
> Nick Piggin <npiggin@...e.de> wrote:
>
> > Having another tree to store block state I think is a good idea as I
> > said in the fsblock thread with Dave, but I haven't clicked as to why
> > it is a big advantage to use it to manage pagecache state. (and I can
> > see some possible disadvantages in locking and tree manipulation
> > overhead).
>
> Yes, there are definitely costs with the state tree, it will take some
> careful benchmarking to convince me it is a feasible solution. But,
> storing all the state in the pages themselves is impossible unless the
> block size equals the page size. So, we end up with something like
> fsblock/buffer heads or the state tree.
Yep, we have to have something.
> One advantage to the state tree is that it separates the state from
> the memory being described, allowing a simple kmap style interface
> that covers subpages, highmem and superpages.
I suppose so, although we should have added those interfaces long
ago ;) The variants in fsblock are pretty good, and you could always
do an arbitrary extent (rather than block) based API using the
pagecache tree if it would be helpful.
> It also more naturally matches the way we want to do IO, making for
> easy clustering.
Well the pagecache tree is used to reasonable effect for that now.
OK the code isn't beautiful ;). Granted, this might be an area where
the seperate state tree ends up being better. We'll see.
> O_DIRECT becomes a special case of readpages and writepages....the
> memory used for IO just comes from userland instead of the page cache.
Could be, although you'll probably also need to teach the mm about
the state tree and/or still manipulate the pagecache tree to prevent
concurrency?
But isn't the main aim of O_DIRECT to do as little locking and
synchronisation with the pagecache as possible? I thought this is
why your race fixing patches got put on the back burner (although
they did look fairly nice from a correctness POV).
Well I'm kind of handwaving when it comes to O_DIRECT ;) It does look
like this might be another advantage of the state tree (although you
aren't allowed to slow down buffered IO to achieve the locking ;)).
> The ability to put in additional tracking info like the process that
> first dirtied a range is also significant. So, I think it is worth
> trying.
Definitely, and I'm glad you are. You haven't converted me yet, but
I look forward to finding the best ideas from our two approaches when
the patches are further along (ext2 port of fsblock coming along, so
we'll be able to have races soon :P).
Thanks,
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists