lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 16 Mar 2009 21:32:39 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Daniel Phillips <phillips@...nq.net>
Cc:	Theodore Tso <tytso@....edu>, linux-fsdevel@...r.kernel.org,
	tux3@...3.org, Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [Tux3] Tux3 report: Tux3 Git tree available

On Monday 16 March 2009 09:41:35 Daniel Phillips wrote:
> Hi Ted,

> > So the really unfortunate thing about allocating the block as soon as
> > the page is dirty is that it spikes out delayed allocation.  By
> > delaying the physical allocation of the logical->physical mapping as
> > long as possible, the filesystem can select the best possible physical
> > location.
>
> Tux3 does not dirty the metadata until data cache is flushed, so the
> allocation decisions for data and metadata are made at the same time.
> That is the reason for the distinction between physical metadata above,
> and logical metadata such as directory data and bitmaps, which are
> delayed.  Though physical metadata is positioned when first dirtied,
> physical metadata dirtying is delayed until delta commit.
>
> Implementing this model (we are still working on it) requires taking
> care of a lot of subtle details that are specific to the Tux3 cache
> model.  I have a hard time imagining those allocation decisions driven
> by callbacks from a buffer-like library.

The filesystem can get pagecache-block-dirty events in a few ways
(often a combination of):
write_begin/write_end, set_page_dirty, page_mkwrite, etc. Short of
implementing entirely your own write path (and even then you need to
hook at least page_mkwrite to catch mmapped writes, for completeness),
I don't see why a get_block(BLOCK_DIRTY) kind of callback is much
harder for you to imagine than any of the other callbacks. Actually
I imagine the block based callback should be easier for filesystems
that support any block size != page size because all the others are
page based.

I would like to hear firm details about any problems definitely,
because I would like to try to make it more generic even if your
filesystem won't use it :)

Now this is not to say the current buffer APIs are totally _optimal_.
As I said, I would like to see at least something along the lines of
"we are about to dirty range (x,y)" callback in the higher level
generic write code. But that's another story (which I am planning
to get to).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ