[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131126104034.GA4854@quack.suse.cz>
Date: Tue, 26 Nov 2013 11:40:34 +0100
From: Jan Kara <jack@...e.cz>
To: David Howells <dhowells@...hat.com>
Cc: viro@...IV.linux.org.uk, smfrench@...il.com, jlayton@...hat.com,
mcao@...ibm.com, aneesh.kumar@...ux.vnet.ibm.com,
linux-cifs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, samba-technical@...ts.samba.org,
sjayaraman@...e.de, linux-ext4@...r.kernel.org
Subject: Re: [PATCH 0/3] Extended file stat functions [ver #2]
Hello,
On Wed 30-06-10 02:16:56, David Howells wrote:
> Implement a pair of new system calls to provide extended and further extensible
> stat functions.
>
> The third of the associated patches provides these new system calls:
>
> struct xstat_dev {
> unsigned int major;
> unsigned int minor;
> };
>
> struct xstat_time {
> unsigned long long tv_sec;
> unsigned long long tv_nsec;
> };
>
> struct xstat {
> unsigned int struct_version;
> #define XSTAT_STRUCT_VERSION 0
> unsigned int st_mode;
> unsigned int st_nlink;
> unsigned int st_uid;
> unsigned int st_gid;
> unsigned int st_blksize;
> struct xstat_dev st_rdev;
> struct xstat_dev st_dev;
> unsigned long long st_ino;
> unsigned long long st_size;
> struct xstat_time st_atime;
> struct xstat_time st_mtime;
> struct xstat_time st_ctime;
> struct xstat_time st_btime;
> unsigned long long st_blocks;
When we are doing this, can we please also change 'st_blocks' to
'st_bytes'? We track space usage in kernel in bytes for a long time so it
would be nice to propagate it to userspace via stat instead of a special
ioctl (at least quotacheck(8) needs to know the exact value).
Honza
> unsigned long long st_gen;
> unsigned long long st_data_version;
> unsigned long long query_flags;
> #define XSTAT_QUERY_SIZE 0x00000001ULL
> #define XSTAT_QUERY_NLINK 0x00000002ULL
> #define XSTAT_QUERY_AMC_TIMES 0x00000004ULL
> #define XSTAT_QUERY_CREATION_TIME 0x00000008ULL
> #define XSTAT_QUERY_BLOCKS 0x00000010ULL
> #define XSTAT_QUERY_INODE_GENERATION 0x00000020ULL
> #define XSTAT_QUERY_DATA_VERSION 0x00000040ULL
> #define XSTAT_QUERY__ORDINARY_SET 0x00000017ULL
> #define XSTAT_QUERY__GET_ANYWAY 0x0000007fULL
> #define XSTAT_QUERY__DEFINED_SET 0x0000007fULL
> unsigned long long extra_results[0];
> };
>
> ssize_t ret = xstat(int dfd,
> const char *filename,
> unsigned atflag,
> struct xstat *buffer,
> size_t buflen);
>
> ssize_t ret = fxstat(int fd,
> struct xstat *buffer,
> size_t buflen);
>
> which are more fully documented in that patch's description.
>
> The bonuses of these new stat functions are:
>
> (1) The fields in the xstat struct are cleaned up. There are no split or
> duplicated fields.
>
> (2) Some extra information is made available (file creation time, inode
> generation number and data version number) where provided by the
> underlying filesystem.
>
> These are implemented here for Ext4 and AFS, but could also be provided
> for CIFS, NTFS and BtrFS and probably others.
>
> (3) The structure is versioned and extensible, meaning that further new system
> calls shouldn't be required.
>
> Note that no lstat() equivalent is required as that can be implemented through
> xstat() with atflag == 0.
>
>
> The first patch makes const a bunch of system call userspace string/buffer
> arguments. I can then make sys_xstat()'s filename pointer const too (though
> the entire first patch is not required for that).
>
> The second patch makes the AFS filesystem use i_generation for the vnode ID
> uniquifier rather than i_version, and assigns i_version to hold the AFS data
> version number, making them more logical for when I want to get at them from
> afs_getattr().
>
> There's a test program attached to the description for patch 3. It can be run
> as follows:
>
> [root@...romeda ~]# /tmp/xstat /afs/archive/linuxdev/fedora9/i386/repodata/
> xstat(/afs/archive/linuxdev/fedora9/i386/repodata/) = 152
> sv=0 qf=77 cr=0.0 iv=7a5 dv=5
> Size: 2048 Blocks: 0 IO Block: 4096 directory
> Device: 00:15 Inode: 83 Links: 2
> Access: (0755/drwxr-xr-x) Uid: 75338 Gid: 0
> Access: 2008-11-05 20:00:12.000000000+0000
> Modify: 2008-11-05 20:00:12.000000000+0000
> Change: 2008-11-05 20:00:12.000000000+0000
> Inode version: 7a5h
> Data version: 5h
>
>
> Things that need consideration:
>
> (1) Is it worth retaining the ability to arbitrarily add extra bits onto the
> end of the stat buffer? And what's the best way to do this?
>
> I've defined a way that from userspace involves assigning bits in
> query_flags to extra results that you might want. But this could instead
> be done, say, by just upping the struct version number any time we want to
> pass back more information. Alternatively, we could go for a tagged data
> method, perhaps using the same format as the recvmsg() control message
> field.
>
> If we use tagged data then rather than being selective, we could just
> return as many tagged data items as we feel the user might want and we can
> cram into the buffer. That could be rather slow, though.
>
> (2) What extra bits of information might we like to see available through the
> stat interface? Security labels? NFS file IDs? Xattrs?
>
> If we went for a tagged data method, xstat() could be modified to take a
> list of tags as an argument, and could then return arbitrarily-sized
> tagged results, including fs-specific stuff.
>
> (3) Does st_blksize really need to be 64 bits on a 64-bit system? Or can it
> be 32-bits? Are we really likely to see something with a 4Gb+ blocksize?
>
> (4) Should the inode number and data version number fields be 128-bit?
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists