lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <768343b5-e9b4-a86c-53de-2929bc290342@gmail.com>
Date:   Wed, 23 Nov 2016 09:37:13 +0100
From:   "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
To:     David Howells <dhowells@...hat.com>, linux-fsdevel@...r.kernel.org
Cc:     mtk.manpages@...il.com, linux-api@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/4] statx: Add a system call to make enhanced file info
 available [ver #3]

Hi David,

On 11/23/2016 01:55 AM, David Howells wrote:
> Add a system call to make extended file information available, including
> file creation and some attribute flags where available through the
> underlying filesystem.
> 
> 
> ========
> OVERVIEW
> ========
> 
> The idea was initially proposed as a set of xattrs that could be retrieved
> with getxattr(), but the general preferance proved to be for a new syscall

s/preferance/preference/

> with an extended stat structure.
> 
> This can feasibly be used to support a number of things, not all of which
> are added here:

It would be very useful if this overview distinguishes which of the features
below are supported in the initial implementation, versus which features
(e.g., femtosecond timestamps) are simply allowed for in a future
implementation.

>  (1) Better support for the y2038 problem [Arnd Bergmann].
> 
>  (2) Creation time: The SMB protocol carries the creation time, which could
>      be exported by Samba, which will in turn help CIFS make use of
>      FS-Cache as that can be used for coherency data.
> 
>      This is also specified in NFSv4 as a recommended attribute and could
>      be exported by NFSD [Steve French].
> 
>  (3) Lightweight stat: Ask for just those details of interest, and allow a
>      netfs (such as NFS) to approximate anything not of interest, possibly
>      without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
>      Dilger].
> 
>  (4) Heavyweight stat: Force a netfs to go to the server, even if it thinks
>      its cached attributes are up to date [Trond Myklebust].
> 
>  (5) Data version number: Could be used by userspace NFS servers [Aneesh
>      Kumar].
> 
>      Can also be used to modify fill_post_wcc() in NFSD which retrieves
>      i_version directly, but has just called vfs_getattr().  It could get
>      it from the kstat struct if it used vfs_xgetattr() instead.
> 
>  (6) BSD stat compatibility: Including more fields from the BSD stat such
>      as creation time (st_btime) and inode generation number (st_gen)
>      [Jeremy Allison, Bernd Schubert].
> 
>  (7) Inode generation number: Useful for FUSE and userspace NFS servers
>      [Bernd Schubert].  This was asked for but later deemed unnecessary
>      with the open-by-handle capability available
> 
>  (8) Extra coherency data may be useful in making backups [Andreas Dilger].

Can you elaborate on the point [8] in this commit message. It's not clear
to me at least what this is about.
> 
>  (9) Allow the filesystem to indicate what it can/cannot provide: A
>      filesystem can now say it doesn't support a standard stat feature if
>      that isn't available, so if, for instance, inode numbers or UIDs don't
>      exist or are fabricated locally...
> 
> (10) Make the fields a consistent size on all arches and make them large.
> 
> (11) Store a 16-byte volume ID in the superblock that can be returned in
>      struct xstat [Steve French].
> 
> (12) Include granularity fields in the time data to indicate the
>      granularity of each of the times (NFSv4 time_delta) [Steve French].
> 
> (13) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
>      Note that the Linux IOC flags are a mess and filesystems such as Ext4
>      define flags that aren't in linux/fs.h, so translation in the kernel
>      may be a necessity (or, possibly, we provide the filesystem type too).
> 
> (14) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
>      Michael Kerrisk].
> 
> (15) Spare space, request flags and information flags are provided for
>      future expansion.
> 
> (16) Femtosecond-resolution timestamps [Dave Chinner].
> 
> 
> ===============
> NEW SYSTEM CALL
> ===============
> 
> The new system call is:
> 
> 	int ret = statx(int dfd,
> 			const char *filename,
> 			unsigned int flags,

In the 0/4 of this patch series, this argument is called 'atflags'.
These should be consistent. 'flags' seems correct to me.

> 			unsigned int mask,
> 			struct statx *buffer);
> 
> The dfd, filename and flags parameters indicate the file to query, in a
> similar way to fstatat().  There is no equivalent of lstat() as that can be
> emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags.  There is
> also no equivalent of fstat() as that can be emulated by passing a NULL
> filename to statx() with the fd of interest in dfd.
> 
> Whether or not statx() synchronises the attributes with the backing store
> can be controlled (this typically only affects network filesystems) can be
> set by OR'ing a value into the flags argument:

s/can be set//

> 
>  (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
>      respect.
> 
>  (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
>      its attributes with the server - which might require data writeback to
>      occur to get the timestamps correct.
> 
>  (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
>      network filesystem.  The resulting values should be considered
>      approximate.
> 
> mask is a bitmask indicating the fields in struct statx that are of
> interest to the caller.  The user should set this to STATX_BASIC_STATS to
> get the basic set returned by stat().  It should be note that asking for

s/note/noted/

> more information may entail extra I/O operations.
> 
> buffer points to the destination for the data.  This must be 256 bytes in
> size.
> 
> 
> ======================
> MAIN ATTRIBUTES RECORD
> ======================
> 
> The following structures are defined in which to return the main attribute
> set:
> 
> 	struct statx_timestamp {
> 		__s64	tv_sec;
> 		__s32	tv_nsec;
> 		__s32	__reserved;
> 	};
> 
> 	struct statx {
> 		__u32	stx_mask;
> 		__u32	stx_blksize;
> 		__u64	stx_attributes;
> 		__u32	stx_nlink;
> 		__u32	stx_uid;
> 		__u32	stx_gid;
> 		__u16	stx_mode;
> 		__u16	__spare0[1];
> 		__u64	stx_ino;
> 		__u64	stx_size;
> 		__u64	stx_blocks;
> 		__u64	__spare1[1];
> 		struct statx_timestamp	stx_atime;
> 		struct statx_timestamp	stx_btime;
> 		struct statx_timestamp	stx_ctime;
> 		struct statx_timestamp	stx_mtime;
> 		__u32	stx_rdev_major;
> 		__u32	stx_rdev_minor;
> 		__u32	stx_dev_major;
> 		__u32	stx_dev_minor;
> 		__u64	__spare2[14];
> 	};
> 
> The defined bits in request_mask and stx_mask are:
> 
> 	STATX_TYPE		Want/got stx_mode & S_IFMT
> 	STATX_MODE		Want/got stx_mode & ~S_IFMT
> 	STATX_NLINK		Want/got stx_nlink
> 	STATX_UID		Want/got stx_uid
> 	STATX_GID		Want/got stx_gid
> 	STATX_ATIME		Want/got stx_atime{,_ns}
> 	STATX_MTIME		Want/got stx_mtime{,_ns}
> 	STATX_CTIME		Want/got stx_ctime{,_ns}
> 	STATX_INO		Want/got stx_ino
> 	STATX_SIZE		Want/got stx_size
> 	STATX_BLOCKS		Want/got stx_blocks
> 	STATX_BASIC_STATS	[The stuff in the normal stat struct]
> 	STATX_BTIME		Want/got stx_btime{,_ns}
> 	STATX_ALL		[All currently available stuff]
> 
> stx_btime is the file creation time, stx_mask is a bitmask indicating the
> data provided and __spares*[] are where as-yet undefined fields can be
> placed.
> 
> Time fields are structures with separate seconds and nanoseconds fields
> plus a reserved field in case we want to add even finer resolution.  Note
> that times will be negative if before 1970; in such a case, the nanosecond
> fields will also be negative if not zero.
> 
> The bits defined in the stx_attributes field convey information about a
> file, how it is accessed, where it is and what it does.  The following
> attributes map to FS_*_FL flags and are the same numerical value:
> 
> 	STATX_ATTR_COMPRESSED		File is compressed by the fs
> 	STATX_ATTR_IMMUTABLE		File is marked immutable
> 	STATX_ATTR_APPEND		File is append-only
> 	STATX_ATTR_NODUMP		File is not to be dumped
> 	STATX_ATTR_ENCRYPTED		File requires key to decrypt in fs
> 
> Within the kernel, the supported flags are listed by:
> 
> 	KSTAT_ATTR_FS_IOC_FLAGS
> 
> [Are any other IOC flags of sufficient general interest to be exposed
> through this interface?]
> 
> New flags include:
> 
> 	STATX_ATTR_AUTOMOUNT		Object is an automount trigger
> 
> These are for the use of GUI tools that might want to mark files specially,
> depending on what they are.
> 
> Fields in struct statx come in a number of classes:
> 
>  (0) stx_dev_*, stx_blksize.
> 
>      These are local system information and are always available.
> 
>  (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
>      stx_size, stx_blocks.
> 
>      These will be returned whether the caller asks for them or not.  The
>      corresponding bits in stx_mask will be set to indicate whether they
>      actually have valid values.
> 
>      If the caller didn't ask for them, then they may be approximated.  For
>      example, NFS won't waste any time updating them from the server,
>      unless as a byproduct of updating something requested.
> 
>      If the values don't actually exist for the underlying object (such as
>      UID or GID on a DOS file), then the bit won't be set in the stx_mask,
>      even if the caller asked for the value.  In such a case, the returned
>      value will be a fabrication.
> 
>      Note that there are instances where the type might not be valid, for
>      instance Windows reparse points.
> 
>  (2) stx_rdev_*.
> 
>      This will be set only if stx_mode indicates we're looking at a
>      blockdev or a chardev, otherwise will be 0.
> 
>  (3) stx_btime.
> 
>      Similar to (1), except this will be set to 0 if it doesn't exist.
> 
> 
> =======
> TESTING
> =======
> 
> The following test program can be used to test the statx system call:
> 
> 	samples/statx/test-statx.c
> 
> Just compile and run, passing it paths to the files you want to examine.
> The file is built automatically if CONFIG_SAMPLES is enabled.
> 
> Here's some example output.  Firstly, an NFS directory that crosses to
> another FSID.  Note that the AUTOMOUNT attribute is set because transiting
> this directory will cause d_automount to be invoked by the VFS.
> 
> 	[root@...romeda tmp]# ./samples/statx/test-statx -A /warthog/data
> 	statx(/warthog/data) = 0
> 	results=17ff
> 	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
> 	Device: 00:26           Inode: 1703937     Links: 124
> 	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
> 	Access: 2016-11-10 15:52:11.219935864+0000
> 	Modify: 2016-11-10 08:07:32.482314928+0000
> 	Change: 2016-11-10 08:07:32.482314928+0000
> 	Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)
> 	IO-blocksize: blksize=1048576
> 
> Secondly, the result of automounting on that directory.
> 
> 	[root@...romeda tmp]# ./samples/statx/test-statx /warthog/data
> 	statx(/warthog/data) = 0
> 	results=17ff
> 	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
> 	Device: 00:27           Inode: 2           Links: 124
> 	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
> 	Access: 2016-11-10 15:52:11.219935864+0000
> 	Modify: 2016-11-10 08:07:32.482314928+0000
> 	Change: 2016-11-10 08:07:32.482314928+0000
> 	IO-blocksize: blksize=1048576
> 
> Signed-off-by: David Howells <dhowells@...hat.com>
> ---
> 
>  arch/x86/entry/syscalls/syscall_32.tbl |    1 
>  arch/x86/entry/syscalls/syscall_64.tbl |    1 
>  fs/exportfs/expfs.c                    |    4 
>  fs/stat.c                              |  297 +++++++++++++++++++++++++++++---
>  include/linux/fs.h                     |    5 -
>  include/linux/stat.h                   |   19 +-
>  include/linux/syscalls.h               |    3 
>  include/uapi/linux/fcntl.h             |    5 +
>  include/uapi/linux/stat.h              |  120 +++++++++++++
>  samples/Kconfig                        |    5 +
>  samples/Makefile                       |    3 
>  samples/statx/Makefile                 |   10 +
>  samples/statx/test-statx.c             |  248 +++++++++++++++++++++++++++
>  13 files changed, 681 insertions(+), 40 deletions(-)
>  create mode 100644 samples/statx/Makefile
>  create mode 100644 samples/statx/test-statx.c
> 
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 2b3618542544..9ba050fe47f3 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -389,3 +389,4 @@
>  380	i386	pkey_mprotect		sys_pkey_mprotect
>  381	i386	pkey_alloc		sys_pkey_alloc
>  382	i386	pkey_free		sys_pkey_free
> +383	i386	statx			sys_statx
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index e93ef0b38db8..5aef183e2f85 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -338,6 +338,7 @@
>  329	common	pkey_mprotect		sys_pkey_mprotect
>  330	common	pkey_alloc		sys_pkey_alloc
>  331	common	pkey_free		sys_pkey_free
> +332	common	statx			sys_statx
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> index a4b531be9168..2acc31751248 100644
> --- a/fs/exportfs/expfs.c
> +++ b/fs/exportfs/expfs.c
> @@ -299,7 +299,9 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
>  	 * filesystem supports 64-bit inode numbers.  So we need to
>  	 * actually call ->getattr, not just read i_ino:
>  	 */
> -	error = vfs_getattr_nosec(&child_path, &stat);
> +	stat.query_flags = 0;
> +	stat.request_mask = STATX_BASIC_STATS;
> +	error = vfs_xgetattr_nosec(&child_path, &stat);
>  	if (error)
>  		return error;
>  	buffer.ino = stat.ino;
> diff --git a/fs/stat.c b/fs/stat.c
> index bc045c7994e1..82e656c42157 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -18,6 +18,15 @@
>  #include <asm/uaccess.h>
>  #include <asm/unistd.h>
>  
> +/**
> + * generic_fillattr - Fill in the basic attributes from the inode struct
> + * @inode: Inode to use as the source
> + * @stat: Where to fill in the attributes
> + *
> + * Fill in the basic attributes in the kstat structure from data that's to be
> + * found on the VFS inode structure.  This is the default if no getattr inode
> + * operation is supplied.
> + */
>  void generic_fillattr(struct inode *inode, struct kstat *stat)
>  {
>  	stat->dev = inode->i_sb->s_dev;
> @@ -27,87 +36,189 @@ void generic_fillattr(struct inode *inode, struct kstat *stat)
>  	stat->uid = inode->i_uid;
>  	stat->gid = inode->i_gid;
>  	stat->rdev = inode->i_rdev;
> -	stat->size = i_size_read(inode);
> -	stat->atime = inode->i_atime;
>  	stat->mtime = inode->i_mtime;
>  	stat->ctime = inode->i_ctime;
> -	stat->blksize = (1 << inode->i_blkbits);
> +	stat->size = i_size_read(inode);
>  	stat->blocks = inode->i_blocks;
> -}
> +	stat->blksize = 1 << inode->i_blkbits;
>  
> +	stat->result_mask |= STATX_BASIC_STATS;
> +	if (IS_NOATIME(inode))
> +		stat->result_mask &= ~STATX_ATIME;
> +	else
> +		stat->atime = inode->i_atime;
> +
> +	if (IS_AUTOMOUNT(inode))
> +		stat->attributes |= STATX_ATTR_AUTOMOUNT;
> +}
>  EXPORT_SYMBOL(generic_fillattr);
>  
>  /**
> - * vfs_getattr_nosec - getattr without security checks
> + * vfs_xgetattr_nosec - getattr without security checks
>   * @path: file to get attributes from
>   * @stat: structure to return attributes in
>   *
>   * Get attributes without calling security_inode_getattr.
>   *
> - * Currently the only caller other than vfs_getattr is internal to the
> - * filehandle lookup code, which uses only the inode number and returns
> - * no attributes to any user.  Any other code probably wants
> - * vfs_getattr.
> + * Currently the only caller other than vfs_xgetattr is internal to the
> + * filehandle lookup code, which uses only the inode number and returns no
> + * attributes to any user.  Any other code probably wants vfs_xgetattr.
> + *
> + * The caller must set stat->request_mask to indicate what they want and
> + * stat->query_flags to indicate whether the server should be queried.
>   */
> -int vfs_getattr_nosec(struct path *path, struct kstat *stat)
> +int vfs_xgetattr_nosec(struct path *path, struct kstat *stat)
>  {
>  	struct inode *inode = d_backing_inode(path->dentry);
>  
> +	stat->query_flags &= ~KSTAT_QUERY_FLAGS;
> +
> +	stat->result_mask = 0;
> +	stat->attributes = 0;
>  	if (inode->i_op->getattr)
>  		return inode->i_op->getattr(path->mnt, path->dentry, stat);
>  
>  	generic_fillattr(inode, stat);
>  	return 0;
>  }
> +EXPORT_SYMBOL(vfs_xgetattr_nosec);
>  
> -EXPORT_SYMBOL(vfs_getattr_nosec);
> -
> -int vfs_getattr(struct path *path, struct kstat *stat)
> +/*
> + * vfs_xgetattr - Get the enhanced basic attributes of a file
> + * @path: The file of interest
> + * @stat: Where to return the statistics
> + *
> + * Ask the filesystem for a file's attributes.  The caller must have preset
> + * stat->request_mask and stat->query_flags to indicate what they want.
> + *
> + * If the file is remote, the filesystem can be forced to update the attributes
> + * from the backing store by passing AT_FORCE_ATTR_SYNC in query_flags or can
> + * suppress the update by passing AT_NO_ATTR_SYNC.
> + *
> + * Bits must have been set in stat->request_mask to indicate which attributes
> + * the caller wants retrieving.  Any such attribute not requested may be
> + * returned anyway, but the value may be approximate, and, if remote, may not
> + * have been synchronised with the server.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_xgetattr(struct path *path, struct kstat *stat)
>  {
>  	int retval;
>  
>  	retval = security_inode_getattr(path);
>  	if (retval)
>  		return retval;
> -	return vfs_getattr_nosec(path, stat);
> +	return vfs_xgetattr_nosec(path, stat);
>  }
> +EXPORT_SYMBOL(vfs_xgetattr);
>  
> +/**
> + * vfs_getattr - Get the basic attributes of a file
> + * @path: The file of interest
> + * @stat: Where to return the statistics
> + *
> + * Ask the filesystem for a file's attributes.  If remote, the filesystem isn't
> + * forced to update its files from the backing store.  Only the basic set of
> + * attributes will be retrieved; anyone wanting more must use vfs_xgetattr(),
> + * as must anyone who wants to force attributes to be sync'd with the server.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_getattr(struct path *path, struct kstat *stat)
> +{
> +	stat->query_flags = 0;
> +	stat->request_mask = STATX_BASIC_STATS;
> +	return vfs_xgetattr(path, stat);
> +}
>  EXPORT_SYMBOL(vfs_getattr);
>  
> -int vfs_fstat(unsigned int fd, struct kstat *stat)
> +/**
> + * vfs_fstatx - Get the enhanced basic attributes by file descriptor
> + * @fd: The file descriptor referring to the file of interest
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_xgetattr().  The main difference is
> + * that it uses a file descriptor to determine the file location.
> + *
> + * The caller must have preset stat->query_flags and stat->request_mask as for
> + * vfs_xgetattr().
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_fstatx(unsigned int fd, struct kstat *stat)
>  {
>  	struct fd f = fdget_raw(fd);
>  	int error = -EBADF;
>  
>  	if (f.file) {
> -		error = vfs_getattr(&f.file->f_path, stat);
> +		error = vfs_xgetattr(&f.file->f_path, stat);
>  		fdput(f);
>  	}
>  	return error;
>  }
> +EXPORT_SYMBOL(vfs_fstatx);
> +
> +/**
> + * vfs_fstat - Get basic attributes by file descriptor
> + * @fd: The file descriptor referring to the file of interest
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_getattr().  The main difference is
> + * that it uses a file descriptor to determine the file location.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_fstat(unsigned int fd, struct kstat *stat)
> +{
> +	stat->query_flags = 0;
> +	stat->request_mask = STATX_BASIC_STATS;
> +	return vfs_fstatx(fd, stat);
> +}
>  EXPORT_SYMBOL(vfs_fstat);
>  
> -int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
> -		int flag)
> +/**
> + * vfs_statx - Get basic and extra attributes by filename
> + * @dfd: A file descriptor representing the base dir for a relative filename
> + * @filename: The name of the file of interest
> + * @flags: Flags to control the query
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_xgetattr().  The main difference is
> + * that it uses a filename and base directory to determine the file location.
> + * Additionally, the addition of AT_SYMLINK_NOFOLLOW to flags will prevent a

s/the addition of AT_SYMLINK_NOFOLLOW to/the use of AT_SYMLINK_NOFOLLOW in/


> + * symlink at the given name from being referenced.
> + *
> + * The caller must have preset stat->request_mask as for vfs_xgetattr().  The
> + * flags are also used to load up stat->query_flags.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_statx(int dfd, const char __user *filename, int flags,
> +	      struct kstat *stat)
>  {
>  	struct path path;
>  	int error = -EINVAL;
> -	unsigned int lookup_flags = 0;
> +	unsigned int lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT;
>  
> -	if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
> -		      AT_EMPTY_PATH)) != 0)
> -		goto out;
> +	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT |
> +		       AT_EMPTY_PATH | KSTAT_QUERY_FLAGS)) != 0)
> +		return -EINVAL;
>  
> -	if (!(flag & AT_SYMLINK_NOFOLLOW))
> -		lookup_flags |= LOOKUP_FOLLOW;
> -	if (flag & AT_EMPTY_PATH)
> +	if (flags & AT_SYMLINK_NOFOLLOW)
> +		lookup_flags &= ~LOOKUP_FOLLOW;
> +	if (flags & AT_NO_AUTOMOUNT)
> +		lookup_flags &= ~LOOKUP_AUTOMOUNT;
> +	if (flags & AT_EMPTY_PATH)
>  		lookup_flags |= LOOKUP_EMPTY;
> +	stat->query_flags = flags;
> +
>  retry:
>  	error = user_path_at(dfd, filename, lookup_flags, &path);
>  	if (error)
>  		goto out;
>  
> -	error = vfs_getattr(&path, stat);
> +	error = vfs_xgetattr(&path, stat);
>  	path_put(&path);
>  	if (retry_estale(error, lookup_flags)) {
>  		lookup_flags |= LOOKUP_REVAL;
> @@ -116,17 +227,65 @@ int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
>  out:
>  	return error;
>  }
> +EXPORT_SYMBOL(vfs_statx);
> +
> +/**
> + * vfs_fstatat - Get basic attributes by filename
> + * @dfd: A file descriptor representing the base dir for a relative filename
> + * @filename: The name of the file of interest
> + * @flags: Flags to control the query
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_statx().  The difference is that it
> + * preselects basic stats only.  The flags are used to load up
> + * stat->query_flags in addition to indicating symlink handling during path
> + * resolution.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
> +		int flags)
> +{
> +	stat->request_mask = STATX_BASIC_STATS;
> +	return vfs_statx(dfd, filename, flags, stat);
> +}
>  EXPORT_SYMBOL(vfs_fstatat);
>  
> -int vfs_stat(const char __user *name, struct kstat *stat)
> +/**
> + * vfs_stat - Get basic attributes by filename
> + * @filename: The name of the file of interest
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_statx().  The difference is that it
> + * preselects basic stats only, terminal symlinks are followed regardless and a

s/terminal symlinks/symlinks in the basename/

> + * remote filesystem can't be forced to query the server.  If such is desired,
> + * vfs_statx() should be used instead.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
> +int vfs_stat(const char __user *filename, struct kstat *stat)
>  {
> -	return vfs_fstatat(AT_FDCWD, name, stat, 0);
> +	stat->request_mask = STATX_BASIC_STATS;
> +	return vfs_statx(AT_FDCWD, filename, 0, stat);
>  }
>  EXPORT_SYMBOL(vfs_stat);
>  
> +/**
> + * vfs_lstat - Get basic attrs by filename, without following terminal symlink
> + * @filename: The name of the file of interest
> + * @stat: The result structure to fill in.
> + *
> + * This function is a wrapper around vfs_statx().  The difference is that it
> + * preselects basic stats only, terminal symlinks are note followed regardless

s/terminal symlinks/symlinks in the basename/
s/note/not/


> + * and a remote filesystem can't be forced to query the server.  If such is
> + * desired, vfs_statx() should be used instead.
> + *
> + * 0 will be returned on success, and a -ve error code if unsuccessful.
> + */
>  int vfs_lstat(const char __user *name, struct kstat *stat)
>  {
> -	return vfs_fstatat(AT_FDCWD, name, stat, AT_SYMLINK_NOFOLLOW);
> +	stat->request_mask = STATX_BASIC_STATS;
> +	return vfs_statx(AT_FDCWD, name, AT_SYMLINK_NOFOLLOW, stat);
>  }
>  EXPORT_SYMBOL(vfs_lstat);
>  
> @@ -141,7 +300,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
>  {
>  	static int warncount = 5;
>  	struct __old_kernel_stat tmp;
> -	
> +
>  	if (warncount > 0) {
>  		warncount--;
>  		printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Recompile your binary.\n",
> @@ -166,7 +325,7 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
>  #if BITS_PER_LONG == 32
>  	if (stat->size > MAX_NON_LFS)
>  		return -EOVERFLOW;
> -#endif	
> +#endif
>  	tmp.st_size = stat->size;
>  	tmp.st_atime = stat->atime.tv_sec;
>  	tmp.st_mtime = stat->mtime.tv_sec;
> @@ -443,6 +602,82 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, const char __user *, filename,
>  }
>  #endif /* __ARCH_WANT_STAT64 || __ARCH_WANT_COMPAT_STAT64 */
>  
> +/*
> + * Set the statx results.
> + */
> +static long statx_set_result(struct kstat *stat, struct statx __user *buffer)
> +{
> +	uid_t uid = from_kuid_munged(current_user_ns(), stat->uid);
> +	gid_t gid = from_kgid_munged(current_user_ns(), stat->gid);
> +
> +#define __put_timestamp(kts, uts) (				\
> +		__put_user(kts.tv_sec,	uts.tv_sec	) ||	\
> +		__put_user(kts.tv_nsec,	uts.tv_nsec	) ||		\
> +		__put_user(0,		uts.__reserved	))
> +
> +	if (__put_user(stat->result_mask,	&buffer->stx_mask	) ||
> +	    __put_user(stat->mode,		&buffer->stx_mode	) ||
> +	    __clear_user(&buffer->__spare0, sizeof(buffer->__spare0))	  ||
> +	    __put_user(stat->nlink,		&buffer->stx_nlink	) ||
> +	    __put_user(uid,			&buffer->stx_uid	) ||
> +	    __put_user(gid,			&buffer->stx_gid	) ||
> +	    __put_user(stat->attributes,	&buffer->stx_attributes	) ||
> +	    __put_user(stat->blksize,		&buffer->stx_blksize	) ||
> +	    __put_user(MAJOR(stat->rdev),	&buffer->stx_rdev_major	) ||
> +	    __put_user(MINOR(stat->rdev),	&buffer->stx_rdev_minor	) ||
> +	    __put_user(MAJOR(stat->dev),	&buffer->stx_dev_major	) ||
> +	    __put_user(MINOR(stat->dev),	&buffer->stx_dev_minor	) ||
> +	    __put_timestamp(stat->atime,	&buffer->stx_atime	) ||
> +	    __put_timestamp(stat->btime,	&buffer->stx_btime	) ||
> +	    __put_timestamp(stat->ctime,	&buffer->stx_ctime	) ||
> +	    __put_timestamp(stat->mtime,	&buffer->stx_mtime	) ||
> +	    __put_user(stat->ino,		&buffer->stx_ino	) ||
> +	    __put_user(stat->size,		&buffer->stx_size	) ||
> +	    __put_user(stat->blocks,		&buffer->stx_blocks	) ||
> +	    __clear_user(&buffer->__spare1, sizeof(buffer->__spare1))	  ||
> +	    __clear_user(&buffer->__spare2, sizeof(buffer->__spare2)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +/**
> + * sys_statx - System call to get enhanced stats
> + * @dfd: Base directory to pathwalk from *or* fd to stat.
> + * @filename: File to stat *or* NULL.
> + * @flags: AT_* flags to control pathwalk.
> + * @mask: Parts of statx struct actually required.
> + * @buffer: Result buffer.
> + *
> + * Note that if filename is NULL, then it does the equivalent of fstat() using
> + * dfd to indicate the file of interest.
> + */
> +SYSCALL_DEFINE5(statx,
> +		int, dfd, const char __user *, filename, unsigned, flags,
> +		unsigned int, mask,
> +		struct statx __user *, buffer)
> +{
> +	struct kstat stat;
> +	int error;
> +
> +	if ((flags & AT_STATX_SYNC_TYPE) == AT_STATX_SYNC_TYPE)
> +		return -EINVAL;
> +	if (!access_ok(VERIFY_WRITE, buffer, sizeof(*buffer)))
> +		return -EFAULT;
> +
> +	memset(&stat, 0, sizeof(stat));
> +	stat.query_flags = flags;
> +	stat.request_mask = mask & STATX_ALL;
> +
> +	if (filename)
> +		error = vfs_statx(dfd, filename, flags, &stat);
> +	else
> +		error = vfs_fstatx(dfd, &stat);
> +	if (error)
> +		return error;
> +	return statx_set_result(&stat, buffer);
> +}
> +
>  /* Caller is here responsible for sufficient locking (ie. inode->i_lock) */
>  void __inode_add_bytes(struct inode *inode, loff_t bytes)
>  {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 16d2b6e874d6..f153199566b4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2916,8 +2916,9 @@ extern const struct inode_operations page_symlink_inode_operations;
>  extern void kfree_link(void *);
>  extern int generic_readlink(struct dentry *, char __user *, int);
>  extern void generic_fillattr(struct inode *, struct kstat *);
> -int vfs_getattr_nosec(struct path *path, struct kstat *stat);
> +extern int vfs_xgetattr_nosec(struct path *path, struct kstat *stat);
>  extern int vfs_getattr(struct path *, struct kstat *);
> +extern int vfs_xgetattr(struct path *, struct kstat *);
>  void __inode_add_bytes(struct inode *inode, loff_t bytes);
>  void inode_add_bytes(struct inode *inode, loff_t bytes);
>  void __inode_sub_bytes(struct inode *inode, loff_t bytes);
> @@ -2935,6 +2936,8 @@ extern int vfs_lstat(const char __user *, struct kstat *);
>  extern int vfs_fstat(unsigned int, struct kstat *);
>  extern int vfs_fstatat(int , const char __user *, struct kstat *, int);
>  extern const char *vfs_get_link(struct dentry *, struct delayed_call *);
> +extern int vfs_xstat(int, const char __user *, int, struct kstat *);
> +extern int vfs_xfstat(unsigned int, struct kstat *);
>  
>  extern int __generic_block_fiemap(struct inode *inode,
>  				  struct fiemap_extent_info *fieinfo,
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index 075cb0c7eb2a..9b81dfcbb57a 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -19,19 +19,26 @@
>  #include <linux/uidgid.h>
>  
>  struct kstat {
> -	u64		ino;
> -	dev_t		dev;
> +	u32		query_flags;	/* Operational flags */
> +#define KSTAT_QUERY_FLAGS (AT_STATX_SYNC_TYPE)
> +	u32		request_mask;	/* What fields the user asked for */
> +	u32		result_mask;	/* What fields the user got */
>  	umode_t		mode;
>  	unsigned int	nlink;
> +	uint32_t	blksize;	/* Preferred I/O size */
> +	u64		attributes;
> +#define KSTAT_ATTR_FS_IOC_FLAGS		0x00000874 /* Attrs corresponding to FS_*_FL flags */
> +	u64		ino;
> +	dev_t		dev;
> +	dev_t		rdev;
>  	kuid_t		uid;
>  	kgid_t		gid;
> -	dev_t		rdev;
>  	loff_t		size;
> -	struct timespec  atime;
> +	struct timespec	atime;
>  	struct timespec	mtime;
>  	struct timespec	ctime;
> -	unsigned long	blksize;
> -	unsigned long long	blocks;
> +	struct timespec	btime;			/* File creation time */
> +	u64		blocks;
>  };
>  
>  #endif
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 91a740f6b884..980c3c9b06f8 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -48,6 +48,7 @@ struct stat;
>  struct stat64;
>  struct statfs;
>  struct statfs64;
> +struct statx;
>  struct __sysctl_args;
>  struct sysinfo;
>  struct timespec;
> @@ -902,5 +903,7 @@ asmlinkage long sys_pkey_mprotect(unsigned long start, size_t len,
>  				  unsigned long prot, int pkey);
>  asmlinkage long sys_pkey_alloc(unsigned long flags, unsigned long init_val);
>  asmlinkage long sys_pkey_free(int pkey);
> +asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
> +			  unsigned mask, struct statx __user *buffer);
>  
>  #endif
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index beed138bd359..813afd6eee71 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -63,5 +63,10 @@
>  #define AT_NO_AUTOMOUNT		0x800	/* Suppress terminal automount traversal */
>  #define AT_EMPTY_PATH		0x1000	/* Allow empty relative pathname */
>  
> +#define AT_STATX_SYNC_TYPE	0x6000	/* Type of synchronisation required from statx() */
> +#define AT_STATX_SYNC_AS_STAT	0x0000	/* - Do whatever stat() does */
> +#define AT_STATX_FORCE_SYNC	0x2000	/* - Force the attributes to be sync'd with the server */
> +#define AT_STATX_DONT_SYNC	0x4000	/* - Don't sync attributes with the server */
> +
>  
>  #endif /* _UAPI_LINUX_FCNTL_H */
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 7fec7e36d921..995e82fe019c 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -1,6 +1,7 @@
>  #ifndef _UAPI_LINUX_STAT_H
>  #define _UAPI_LINUX_STAT_H
>  
> +#include <linux/types.h>
>  
>  #if defined(__KERNEL__) || !defined(__GLIBC__) || (__GLIBC__ < 2)
>  
> @@ -41,5 +42,124 @@
>  
>  #endif
>  
> +/*
> + * Timestamp structure for the timestamps in struct statx.
> + */
> +struct statx_timestamp {
> +	__s64	tv_sec;		/* Number of seconds before or after midnight 1st Jan 1970 */
> +	__s32	tv_nsec;	/* Number of nanoseconds before or after sec (0-999,999,999) */

Here, add a note in the comment: "Will be a negative value (if nonzero) if tv_sec is negative"

[...]

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ