lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 19 Jul 2013 07:54:51 +0800
From:	Zheng Liu <gnehzuil.liu@...il.com>
To:	Eric Sandeen <sandeen@...hat.com>
Cc:	Theodore Ts'o <tytso@....edu>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 0/5 v2] add extent status tree caching

Hi Eric,

On Thu, Jul 18, 2013 at 01:35:24PM -0500, Eric Sandeen wrote:
> On 7/16/13 10:17 AM, Theodore Ts'o wrote:
> > In addition to fixing a few bugs and addressing review comments, we now
> > add a new ioctl, EXT4_IOC_PRECACHE_EXTENTS, which forces all of the
> > extents in an inode to be cached in the extents status tree, and marks
> > them to be preferentially protected when under memory pressure.  
> > 
> > This is critically important when using AIO to a preallocated file,
> > since if we need to read in blocks from the extent tree, the
> > io_submit(2) system call becomes synchronous, which is rather rude to
> > applications which were expecting the AIO to be "A".
> > 
> > As a bonus, using the extent status tree to store the logical to
> > physical block mapping is usually more compact that having to keep one
> > or more extent tree blocks in the buffer cache.
> > 
> > (Should we do this all the time, instead of when the application
> > explicitly requests it?  Maybe; there could be cases with very large,
> > fragmented files accessed by an application such as "file" is only needs
> > to look at a small subset of the file where this could result in an
> > unnecessary work and memory allocated.  OTOH, 95%+ of the time this
> > would probably be a win...)
> 
> I'd say yes, we should - maybe not in all cases but if you need it for
> AIO, try to make it "all the time" at least for that AIO?
> 
> We keep telling application writers not to assume certain things about
> various filesystems, or to write applications that treat ext4 differently 
> han ext3 differently than xfs etc...

Yes, I agree with you.  As Ted and I have discussed the problem of
setting 'data=writeback' by default in ext4.  Although most application
writers have realized that they need to explicit call fsync to flush all
dirty pages, there are still some legacy applications that depends on
the 'data=ordered' mode to flush all dirty pages.

> 
> This goes the other way.
> 
> In the end who (besides google?) is really going to call this IOCTL?
> 
> I wondered if only doing this when files are opened O_DIRECT might make
> sense, but Jeff Moyer pointed out that giant databases probably don't
> want to read in their entire block mapping tree - OTOH, they probably use
> preallocation if they're smart, and maybe it's not that bad.

I have talked with my colleague who is a MySQL contributor about whether
MySQL tries to preallocate some files or not.  As far as I know, at
least MySQL doesn't try to do it until now.  I don't have the source
code of Oracle or DB2, these giant databases might use preallocation I
guess.

> 
> Or what about tying this into POSIX_FADV_WILLNEED?  Hohum, that gets
> into force_page_cache_readahead().  We need POSIX_FADV_WILLNEED_META...

Yes, _WILLNEED_METADATA flag makes sense to me if other file systems
also want to support it.  But, as Ted said, now adding it in ioctl might
a good choice because we won't impact other file systems.

                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists