linux-kernel - Re: [RFC] new ->perform

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1274432710.2578.11.camel@localhost>
Date:	Fri, 21 May 2010 10:05:10 +0100
From:	Steven Whitehouse <steve@...gwyn.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Jan Kara <jack@...e.cz>, Nick Piggin <npiggin@...e.de>,
	Josef Bacik <josef@...hat.com>, linux-fsdevel@...r.kernel.org,
	chris.mason@...cle.com, hch@...radead.org,
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] new ->perform_write fop

Hi,

On Fri, 2010-05-21 at 09:05 +1000, Dave Chinner wrote:
> On Thu, May 20, 2010 at 10:12:32PM +0200, Jan Kara wrote:
> > On Thu 20-05-10 09:50:54, Dave Chinner wrote:
> > > On Wed, May 19, 2010 at 01:09:12AM +1000, Nick Piggin wrote:
> > > > On Tue, May 18, 2010 at 10:27:14PM +1000, Dave Chinner wrote:
> > > > 
> > > > Is it really going to be a problem to implement block hole punching
> > > > in ext4 and gfs2?
> > > 
[snip]
> >   b) E.g. ext4 can do even without hole punching. It can allocate extent
> >      as 'unwritten' and when something during the write fails, it just
> >      leaves the extent allocated and the 'unwritten' flag makes sure that
> >      any read will see zeros. I suppose that other filesystems that care
> >      about multipage writes are able to do similar things (e.g. btrfs can
> >      do the same as far as I remember, I'm not sure about gfs2).
> 
> Allocating multipage writes as unwritten extents turns off delayed
> allocation and hence we'd lose all the benefits that this gives...

It should be possible to implement hole punching in GFS2 I think. The
main issue is locking order of resource groups. We have on our todo list
a rewrite of the truncate/delete code which is currently used to
deallocate data blocks and metadata tree blocks. The current algorithm
is a rather inefficient recursive scanning of the tree which is done
multiple times depending on the tree height.

Adapting that to punch holes should be possible without too much effort
if we need to do that. We do need to allow for the possibility that such
a deallocation might have to be split into multiple transactions
depending on the amount of metadata involved (for large files, this
could be larger than the size of the log for example). Currently the
code will split up truncates into multiple transactions which allows the
deallocation to be restartable from any transaction boundary.

GFS2 does not have any way to mark unwritten extents, so we cannot do
delayed allocation or implement an efficient fallocate. We can do better
performance-wise than just dd'ing zeros to a file for fallocate, but
we'll never be able to match a fs that can mark extents unwritten in
performance terms,

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/