[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <F2018580-9ACC-4A63-A8D1-155FA371749A@dilger.ca>
Date: Fri, 27 Apr 2012 13:31:07 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: Dave Chinner <david@...morbit.com>
Cc: David Howells <dhowells@...hat.com>, linux-fsdevel@...r.kernel.org,
linux-nfs@...r.kernel.org, linux-cifs@...r.kernel.org,
samba-technical@...ts.samba.org, linux-ext4@...r.kernel.org,
wine-devel@...ehq.org, kfm-devel@....org, nautilus-list@...me.org,
linux-api@...r.kernel.org, libc-alpha@...rceware.org
Subject: Re: [PATCH 0/6] Extended file stat system call
On 2012-04-27, at 7:13 AM, Dave Chinner wrote:
> Have a look at fs/xfs/xfs_dinode.h. There's a bunch of flags defined
> at the bottom of the file.
>
> Stuff like the "nodefrag", "nodump", and "prealloc" bits seem fairly
> generic - they are for indicating that files are to be avoided for
> defrag or backup purposes, the prealloc bit indicates that fallocate
> has been used to reserve space on the inode (finding files that space
> can be punched out of safely), and so on.
There is already the FS_NODUMP_FL in the standard FS_IOC_GETFLAGS ioctl
and I expect this to be in statxat() also. In ext4 there was also an
EXT4_EOFBLOCKS_FL added for inodes with fallocate'd data beyond EOF,
but Eric thought it was a pain to maintain and it has been deprecated
in ext4 and e2fsprogs recently.
> Currently these things are queried and manipulated by ioctls
> (XFS_IOC_FSX[GS]ETATTR) along with extent size hints, project
> quotas, etc. but I think there's some wider use for many of the
> flags, which is why I was asking is there's any thought to this sort
> of flag being exposed by the VFS.
>
> Historically the flags exposed by the VFS are those used by extN - I
> see little reason why we should favour one filesystem's flags over
> any others in an extended stat interface if they are generically
> useful....
Sure, they started as ext4 flags because the "lsattr" and "chattr"
tools were using this ioctl/flags, but have become more generic in
recent years. FS_NOTAIL_FL was added for Reiserfs, and FS_NOCOW_FL
was added for another filesystem (maybe Btrfs?). I'm not against
adding more flags here that are generically useful, and recommended
that statxat() have a 64-bit st_ioc_flags, since there are already
22 FS_*_FL flags defined today.
>> Either you can add some of them to the ioc flags (which may be
>> impractical, I grant you) or we'd have to add an arbitrary fs-type
>> specific field and specify the host fs (the provision of which might
>> not be a bad idea in and of itself) to tell userspace how to interpret them.
>
> Well, that's the complexity, isn't it. I have no good answer to
> that...
>
>>> Along the same lines, filesytsems can have different allocation
>>> constraints to IO the filesystem block size - ext4 with it's
>>> bigalloc hack, XFS with it's per-inode extent size hints and the
>>> realtime device, etc. Then there's optimal IO characteristics
>>> (e.g. geometery hints like stripe unit/stripe width for the
>>> allocation policy of that given file) that applications could
>>> use if they were present rather than having to expose them
>>> through ioctls that nobody even knows about...
>>
>> Yeah... Not representable by one number. You'd have to unset a
>> flag to say you were providing this information.
>>
>> However, providing a whole bunch of hints about I/O characteristics
>> is probably beyond this syscall - especially if it isn't constant
>> over the length of a file. That's specialist knowledge that most
>> applications don't need to know.
>> Having a generic way to retrieve it, though, may be a good idea.
>
> We're continually talking about applications giving us usage hints
> on what IO they are going to do so the storage can optimise the IO.
> IO is still a GIGO problem, though, and the idea of geometry hints
> is to enable us to tell the application to do well formed IO. i.e.
> less garbage.
>
> XFS has ioctls to expose filesystem geometry, optimal IO sizes, the
> alignment limits for direct IO, etc, and they are very useful to
> applications that care about high performance IO. A lot of this can
> be distilled down to a simple set of geometries, and generally
> speaking they don't change mid way through a file....
>
>> OTOH, there's plenty of uncommitted space, so if we can condense
>> the hints down to something small, we could perhaps add it later -
>> but from your paragraph above, it doesn't sound like it'll be small.
>
> Allocation block size, minimum sane IO size (to avoid page cache RMW
> cycles or DIO zeroing), minimum prefered IO size (e.g. stripe unit),
> optimal IO size for bandwidth (e.g. stripe width). I don't think
> there's much more than that which will be really usable by
> applications.
I think this is a minimal set that makes sense, and is manageable for
both the interface and for users. Even if it isn't 100% correct for
every file of every filesystem, it still makes sense for many systems.
I'd suggest st_frsize (like BSD statvfs() f_frsize) would be the
minimum fragment or page size, st_iosize (BSD f_iosize) could be
the optimal IO size, and "st_stripesize" for the minimum preferred RAID/chunk size.
One could argue that "st_blksize" is used for the "optimal IO size"
on Linux today, but this is an overloaded term. It _appears_ to
represent the filesystem blocksize, which it usually is not, and on
BSD st_bsize means the minimum blocksize and has a confusingly
similar name. Since any application using this API needs to do some
extra coding already, we may as well give the structure members good
names that are not ambiguous.
>>> Perhaps also exposing the project ID for quota purposes, like we
>>> do UID and GID. That way we wouldn't need a filesystem specific
>>> ioctl to read it....
>>
>> Is this an XFS only thing? If so, can it be generalised?
>
> Right now it is, but there's been patches in the past to introduce
> project quotas to ext4. That didn't go far because it was done in a
> way that was semantically different to XFS (for no reason that I
> could understand) and nobody wanted two different sets of semantics
> for the "same" feature. The most common use of project quotas is to
> implement sub-tree quotas, which is probably of more interest to
> btrfs folks as it is an exact match for per-subvolume quotas.
>
> So, yes, I do see it as something generically useful - it's a
> feature that a lot of people use XFS specifically for....
I'd agree. There was the tree quota project for ext4, and I've also
heard this is available in other filesystems.
Cheers, Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists