lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 16 Apr 2011 10:25:00 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Andreas Dilger <adilger@...ger.ca>
Cc:	Christoph Hellwig <hch@...radead.org>,
	Eric Blake <eblake@...hat.com>,
	Jim Meyering <jim@...ering.net>,
	Eric Sandeen <sandeen@...deen.net>, xfs-oss <xfs@....sgi.com>,
	coreutils@....org, linux-ext4@...r.kernel.org,
	Markus Trippelsdorf <markus@...ppelsdorf.de>
Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP
 related?)

On Fri, Apr 15, 2011 at 04:28:37PM -0600, Andreas Dilger wrote:
> On 2011-04-15, at 11:26 AM, Christoph Hellwig wrote:
> > On Fri, Apr 15, 2011 at 11:24:19AM -0600, Eric Blake wrote:
> >> Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE
> >> and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier
> >> to define and easier to use) interface for discovering the regions of a
> >> file that only contain NUL bytes?
> > 
> > Yes, I've already suggested that both in this thread and on IRC.
> > 
> > For efficient copies it's the only usable interface.
> 
> I suspect that these bugs would have still existed whether the
> interface is SEEK_HOLE/SEEK_DATA, or FIEMAP.  The main problem is
> that the delalloc pages were not accounted for correctly during
> layout traversal..

It's not delalloc that is the problem - XFS accounts for them just
fine in the extent map when asked. However, XFS does speculative
delayed allocation over regions that contain no data, so if the
core-utils folk are assuming that delalloc extents contain data and
need to be copied, they're in for a nasty surprise.

However, every example I've seen in this thread has had to do with
unwritten extents not changing state when data is written into the
page cache. i.e. people are struggling with the expected behaviour
of unwritten extents.

That is, unwritten extent remain unwritten extents until data has
been _physically_ written to them. If there is data in the page
cache over the unwritten extent, it is still an unwritten extent.
If the system crashes while in this state, then the extent _must_
remain an unwritten extent after recovery, otherwise it exposes
stale data.

Further, using FIEMAP to determine where the data is that needs
copying is extremely fragile. What happens when FIEMAP grows a
different type of extent that contains data? cp will break, because
it doesn't think it needs to copy data in extents of an unknown
type. Or it will break because it thinks it needs to copy it and
there's something in it that should not be copied.

Also, cp shoul dnot be trying to replicate the physical layout of
the file when copying it - that's for the filesystem to decide and
having userspace try to do this is a sure recipe for causing severe
filesystem fragmentation. The filesystems already do an excellent
job of optimising allocation - userspace should not be trying to
second guess what is optimal layout for the filesystem.

Fundamentally, what the core-utils guys want is FIEMAP to tell them
where data is in the file, regardless of whether it is in memory or
on disk. That is not what FIEMAP is intended for and matches
SEEK_HOLE/SEEK_DATA precisely.

SEEK_HOLE/SEEK_DATA have very well understood semantics and is
designed specifically for optimising acceess to sparse files. This
interface abstracts all the details of how different filesystems
store their data so the application doesn't need to care about it.
The API is so, so much simpler to use and understand, to. And if the
filesystem has data in cache over an unwritten extent, then by
definition it's still data to be returned by SEEK_DATA. If it fails
to return the range as such then the implementation is broken.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ