[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <200903162132.40344.nickpiggin@yahoo.com.au>
Date: Mon, 16 Mar 2009 21:32:39 +1100
From: Nick Piggin <nickpiggin@...oo.com.au>
To: Daniel Phillips <phillips@...nq.net>
Cc: Theodore Tso <tytso@....edu>, linux-fsdevel@...r.kernel.org,
tux3@...3.org, Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: [Tux3] Tux3 report: Tux3 Git tree available
On Monday 16 March 2009 09:41:35 Daniel Phillips wrote:
> Hi Ted,
> > So the really unfortunate thing about allocating the block as soon as
> > the page is dirty is that it spikes out delayed allocation. By
> > delaying the physical allocation of the logical->physical mapping as
> > long as possible, the filesystem can select the best possible physical
> > location.
>
> Tux3 does not dirty the metadata until data cache is flushed, so the
> allocation decisions for data and metadata are made at the same time.
> That is the reason for the distinction between physical metadata above,
> and logical metadata such as directory data and bitmaps, which are
> delayed. Though physical metadata is positioned when first dirtied,
> physical metadata dirtying is delayed until delta commit.
>
> Implementing this model (we are still working on it) requires taking
> care of a lot of subtle details that are specific to the Tux3 cache
> model. I have a hard time imagining those allocation decisions driven
> by callbacks from a buffer-like library.
The filesystem can get pagecache-block-dirty events in a few ways
(often a combination of):
write_begin/write_end, set_page_dirty, page_mkwrite, etc. Short of
implementing entirely your own write path (and even then you need to
hook at least page_mkwrite to catch mmapped writes, for completeness),
I don't see why a get_block(BLOCK_DIRTY) kind of callback is much
harder for you to imagine than any of the other callbacks. Actually
I imagine the block based callback should be easier for filesystems
that support any block size != page size because all the others are
page based.
I would like to hear firm details about any problems definitely,
because I would like to try to make it more generic even if your
filesystem won't use it :)
Now this is not to say the current buffer APIs are totally _optimal_.
As I said, I would like to see at least something along the lines of
"we are about to dirty range (x,y)" callback in the higher level
generic write code. But that's another story (which I am planning
to get to).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists