lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110420172127.GF3030@thunk.org>
Date:	Wed, 20 Apr 2011 13:21:27 -0400
From:	Ted Ts'o <tytso@....edu>
To:	Christoph Hellwig <hch@...radead.org>
Cc:	Eric Sandeen <sandeen@...deen.net>,
	Dave Chinner <david@...morbit.com>,
	Yongqiang Yang <xiaoqiangnk@...il.com>,
	Andreas Dilger <adilger@...ger.ca>, xfs-oss <xfs@....sgi.com>,
	"coreutils@....org" <coreutils@....org>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	P?draig Brady <P@...igbrady.com>,
	Markus Trippelsdorf <markus@...ppelsdorf.de>
Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP
 related?)

On Wed, Apr 20, 2011 at 11:21:31AM -0400, Christoph Hellwig wrote:
> 
> How do you want to union the existance of an extent with a state
> on disk, with a pending modification to it that is still in-memory
> and not flushed out to disk yet?  This is looking into an uncertain
> future, as the extent map might change in various other ways before
> the transaction to conver the unwritten extents goes to disk.

So for example, suppose you have a single unwritten extent on disk,
but there are 3 regions within that extent range's that have unwritten
pages, you return 3 or 4 fiemap_extent structures, reflecting the
state if the unwritten pages were pushed out to disk at the time of
the fiemap ioctl --- but without actually doing the expensive sync
operation.  The one case where you can't do that is in the case of
delayed allocation blocks, since you won't know where on disk they
would be going, necessarily --- but hey, conveniently we have a
DELALLOC bit already defined....

> And if we do this it would need to be a new option to FIEMAP, as
> it changes the semantics from the existing one that returns the
> actual state on disk (plus the magic delalloc bit).

Well, we seem to have inconsistent semantics right now, because we
never defined the semantics clearly enough from the beginning.  So no
matter which choice we choose, including "the on-disk extent state
only, and nuke the delalloc bit", we will be changing semantics.  I'm
not sure we can get around that.

> And even if you find semantics that take pending unwrittent extent
> conversions into account and still make sense how do you plan to
> implement them?  For buffered writes into unwritten extents it could
> be done by walking the pagecache and buffers after adding a new
> flag for an already converted unwritten extent to the buffer head
> state.  But there's no easy way to do that for direct I/O.

If the file is being actively modified (for example with direct I/O),
there will be inevitably race conditions.  If only some of the pending
conversions have been taken into account, that seems like it's
reasonable result.  If a file is actively being modified by many DIO
writes, even using FIEMAP_FLAG_SYNC isn't going to help you get a
coherent view of the file, so this seems to be a previously unsolved
problem....

> > In the case of #1 and #2, we really need to implement support for
> > SEEK_HOLE/SEEK_DATA for userspace programs like cp who want to know
> > this information.
> 
> We need to do that anyway, as fiemap is a horrible interface for
> tools that just want to skip holes.

I agree that implementing SEEK_HOLE/SEEK_DATA is a good thing
regardless of which choice we end up choosing.

	      	    	      	     - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ