lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 12 Jun 2009 13:33:01 -0400
From:	Theodore Tso <tytso@....edu>
To:	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Cc:	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	Eric Sandeen <sandeen@...hat.com>,
	Andreas Dilger <adilger@....com>
Subject: Re: Fallocate and DirectIO

On Fri, Jun 12, 2009 at 06:01:12PM +0530, Aneesh Kumar K.V wrote:
> Hi,
> 
> I noticed yesterday that a write to fallocate
> space via directIO results in fallback to buffer_IO. ie the userspace
> pages get copied to the page cache and then call a sync.
> 
> I guess this defeat the purpose of using directIO. May be we should
> consider this a high priority bug.

I agree that many of users of fallocate() feature (i.e. databases) are
going to consider this to be a major misfeature.

There's going to be a major performance hit though --- O_DIRECT is
supposed to be synchronous if all of the alignment requirements are
met, which means that by the time the write(2) system call returns,
the data is guaranteed to be on disk.  But if we need to manipulate
the extent tree to indicate that the block is now in use (so the data
is actually accessible), do we force a synchronous journal commit or
not?  If we don't, then a crash right after an O_DIRECT right into an
uninitialized region will cause the data to be "lost" (or at least,
unavailable via the read/write system call).  If we do, then the first
write into uninitialized block will cause a synchronous journal commit
that will be Slow And Painful, and it might destroy most of the
performance benefits that might tempt an enterprise database client to
use fallocate() in the first place.

I wonder how XFS deals with this case?  It's a problem that is going
to hit any journalled filesystem that wants to support fallocate() and
direct I/O.

One thing I can think of potentially doing is to check to see if the
extent tree block has already been journalled, and if it is not
currently involved the current transaction or the previous committing
transaction, *and* if there is space in the extent tree to mark the
current unitialized block as initialized (i.e., if the extent needs to
be split, there is sufficient space so we don't have to allocate a new
leaf block for the extent tree), we could update the leaf block in
place and then synchronously write it out, and thus avoid needing to
do a synchronous journal commit.

In any case, adding this support is going to be non-trivial.  If
someone has time to work on it in the next 2-3 weeks or so, I can push
it to Linus as a bug fix --- but I'm concerned the fixing this may be
tricky enough (and the patch invasive enough) that it might be
challenging to get this fixed in time for 2.6.31.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ