lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161117234047.GE28177@dastard>
Date:   Fri, 18 Nov 2016 10:40:47 +1100
From:   Dave Chinner <david@...morbit.com>
To:     David Howells <dhowells@...hat.com>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/4] statx: Add a system call to make enhanced file info
 available

On Thu, Nov 17, 2016 at 01:35:03PM +0000, David Howells wrote:
> Add a system call to make extended file information available, including
> file creation, data version and some attribute flags where available
> through the underlying filesystem.
> 
> 
> ========
> OVERVIEW
> ========
> 
> The idea was initially proposed as a set of xattrs that could be retrieved
> with getxattr(), but the general preferance proved to be for a new syscall
> with an extended stat structure.
> 
> This has a number of uses:
> 
....
>  (5) Data version number: Could be used by userspace NFS servers [Aneesh
>      Kumar].
> 
>      Can also be used to modify fill_post_wcc() in NFSD which retrieves
>      i_version directly, but has just called vfs_getattr().  It could get
>      it from the kstat struct if it used vfs_xgetattr() instead.

This needs a much clearer name that stx_version as "version" is
entirely ambiguous. e.g. Inodes have internal version numbers to
disambiguate life cycles. and there are versioning filesystems
which have multiple versions of the same file. 

So if stx_version this is intended to export the internal filesystem
inode change counter (i.e. inode->i_version) then lets call it that:
stx_modification_count. It's clear and unambiguous as to what it
represents, especially as this counter is more than just a "data
modification" counter - inode metadata modifications will also
cause it to change....


....

> (13) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
>      Note that the Linux IOC flags are a mess and filesystems such as Ext4
>      define flags that aren't in linux/fs.h, so translation in the kernel
>      may be a necessity (or, possibly, we provide the filesystem type too).

And we now also have FS_IOC_FSGETXATTR that extends the flags
and information userspace can get from filesystems. It makes little
sense to now add xstat() and not add everything this interface
also exports...

> The defined bits in request_mask and stx_mask are:
> 
> 	STATX_TYPE		Want/got stx_mode & S_IFMT
> 	STATX_MODE		Want/got stx_mode & ~S_IFMT
> 	STATX_NLINK		Want/got stx_nlink
> 	STATX_UID		Want/got stx_uid
> 	STATX_GID		Want/got stx_gid
> 	STATX_ATIME		Want/got stx_atime{,_ns}
> 	STATX_MTIME		Want/got stx_mtime{,_ns}
> 	STATX_CTIME		Want/got stx_ctime{,_ns}
> 	STATX_INO		Want/got stx_ino
> 	STATX_SIZE		Want/got stx_size
> 	STATX_BLOCKS		Want/got stx_blocks
> 	STATX_BASIC_STATS	[The stuff in the normal stat struct]
> 	STATX_BTIME		Want/got stx_btime{,_ns}
> 	STATX_VERSION		Want/got stx_version
> 	STATX_ALL		[All currently available stuff]

What happens when an application uses STATX_ALL from a future kernel
that defines more flags than are initially supported, and that
application then is run on a kernel that onyl supports the initial
fields?

> stx_btime is the file creation time; stx_version is the data version number
> (i_version); stx_mask is a bitmask indicating the data provided; and
> __spares*[] are where as-yet undefined fields can be placed.
> 
> Time fields are split into separate seconds and nanoseconds fields to make
> packing easier and the granularities can be queried with the filesystem
> info system call.  Note that times will be negative if before 1970; in such
> a case, the nanosecond fields will also be negative if not zero.

So what happens in ten years time when we want to support
femptosecond resolution in the timestamp interface? We've got to
change everything to 64 bit? Shouldn't we just make everything
timestamp related 64 bit?

> 
> The bits defined in the stx_attributes field convey information about a
> file, how it is accessed, where it is and what it does.  The following
> attributes map to FS_*_FL flags and are the same numerical value:

Please isolate the new interface flags completely from the FS_*_FL
values. We should not repeat the mistake of tying values derived
from filesystem specific on-disk values to a user interface. 

> 	STATX_ATTR_COMPRESSED		File is compressed by the fs
> 	STATX_ATTR_IMMUTABLE		File is marked immutable
> 	STATX_ATTR_APPEND		File is append-only
> 	STATX_ATTR_NODUMP		File is not to be dumped
> 	STATX_ATTR_ENCRYPTED		File requires key to decrypt in fs
> 
> The supported flags are listed by:
> 
> 	STATX_ATTR_FS_IOC_FLAGS

Again, we have many more common and extended flags than this.
NOATIME and SYNC are two that immediately come to mind as generic
flags that should be in this...

> 
> [Are any other IOC flags of sufficient general interest to be exposed
> through this interface?]
> 
> New flags include:
> 
> 	STATX_ATTR_NONUNIX_OWNERSHIP	File doesn't have Unixy ownership
> 	STATX_ATTR_HAS_ACL		File has an ACL

So statx will require us to do ACL lookups? i.e. instead of just
reading the inode to get the information, we'll also have to do
extended attribute lookups? That's potentially very expensive if
the extended attribute is not stored in the inode....

> 	STATX_ATTR_KERNEL_API		File is kernel API (eg: procfs/sysfs)
> 	STATX_ATTR_REMOTE		File is remote and needs network
> 	STATX_ATTR_FABRICATED		File was made up by fs

Every file is fabricated by a filesystem :P

Perhaps you're wanting "virtual file" because it is has no physical
presence?

> Fields in struct statx come in a number of classes:
> 
>  (0) stx_dev_*, stx_blksize.
> 
>      These are local system information and are always available.

What does stx_blksize actually mean? It's completely ambiguous in
stat() because we don't actually report the physical block size
here - we report the "minimum unit of efficient IO" that we expect
applications to use. Please define :P


> =======
> TESTING
> =======
> 
> The following test program can be used to test the statx system call:
> 
> 	samples/statx/test-statx.c
> 
> Just compile and run, passing it paths to the files you want to examine.
> The file is built automatically if CONFIG_SAMPLES is enabled.

Can we get xfstests written to exercise and validate all this
functionality, please? I'd suggest that adding xfs_io support for
the statx syscall would be far more useful for xfstests than a
standalone  test program, too. We already have equivalent stat()
functionality in xfs_io and that's used quite a bit in xfstests....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ