lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130722100255.GF11674@dastard>
Date:	Mon, 22 Jul 2013 20:02:55 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Theodore Ts'o <tytso@....edu>, Eric Sandeen <sandeen@...hat.com>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 0/5 v2] add extent status tree caching

On Mon, Jul 22, 2013 at 10:17:42AM +0800, Zheng Liu wrote:
> On Mon, Jul 22, 2013 at 11:38:31AM +1000, Dave Chinner wrote:
> > On Fri, Jul 19, 2013 at 12:19:30PM -0400, Theodore Ts'o wrote:
> > > On Fri, Jul 19, 2013 at 01:33:09PM +1000, Dave Chinner wrote:
> > > > An ioctl is kinda silly for this. Just use O_NONBLOCK when calling
> > > > open() and do the prefetch right in the open call. The open() can
> > > > block, anyway, and what you are trying to do is non-blocking IO with
> > > > AIO, so it seems like we've already got a sensible, generic
> > > > interface for triggering this sort of prefetch operation.
> > > 
> > > O_NONBLOCK (either set via open or fcntl) is a possibility, since it's
> > > carefully defined to be unspecified for regular files by SUSv3.  It is
> > > quite different from the existing semantics for O_NONBLOCK, though.
> > > Currently, for all file types where O_NONBLOCK is not ignored, open(2)
> > > is guaranteed itself not to block.  If we use O_NONBLOCK for regular
> > > files to mean that any necessary metadata blocks required for AIO to
> > > be "A" will be cached, then it will make open(2) much more likely to
> > > block.  Also, for all file types where O_NONBLOCK is not ignored,
> > > read(2) will not block but instead return -1 and set errno to EAGAIN.
> > > This would also be a change.
> > > 
> > > If we tried to get this new semantics for O_NONBLOCK to be accepted by
> > > the Austin Group for standardization in the future, would they accept
> > > it, or would they say, "this makes me vommit"?  I have a suspicion
> > > there reaction might be closer to the latter....
> > > 
> > > If we want a VFS-level API, in my opinion an fadvise() flag would be a
> > > better choice.
> > 
> > Sure. Make it an fadvise() flag - just don't add ioctls for things
> > that are generically useful.
> > 
> > On second thoughts - you're trying to get the extent map read in. We
> > already have an interface for querying extent maps - fiemap.
> > FIEMAP_FLAG_PREFETCH along with the range of the file you want the
> > extent map prefetched for?
> 
> I don't think fiemap is a good interface.  The application uses
> fiemap(2) to retrieve extent mapping. 

fiemap is used to query information about extent maps. What it
returns is entirely dependent on the input parameters that are
passed to it. Indeed, from Documentation/filesystems/fiemap.txt:

"If fm_extent_count is zero, then the fm_extents[] array is ignored
(no extents will be returned), and the fm_mapped_extents count will
hold the number of extents needed in fm_extents[] to hold the file's
current mapping."

Think about that for a minute. What does the filesystem do with such
an fiemap query when the extent map is not cached?  That's right,
*fiemap reads the extent map from disk into the cache* and then
returns the number of extents in the range.

All I have suggested is adding a flag to make this an *explicit
operation* rather than a side effect of a "count extents" query. I
fail to see any justification for a whole new interface when we
already have a perfectly functional one that already provides the
functionality that is required...

> That means that the app could use
> these mappings in userspace.  But now we want to cache these mappings in
> kernel space.

If the filesystem is not caching the extents read in during fiemap
operations then perhaps you should look into fixing that deficiency.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ