linux-kernel - Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161119224331.GE31101@dastard>
Date:   Sun, 20 Nov 2016 09:43:31 +1100
From:   Dave Chinner <david@...morbit.com>
To:     David Howells <dhowells@...hat.com>
Cc:     Andreas Dilger <adilger@...ger.ca>, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/4] statx: Add a system call to make enhanced file info
 available

On Fri, Nov 18, 2016 at 10:54:02PM +0000, David Howells wrote:
> Dave Chinner <david@...morbit.com> wrote:
> 
> > And when we start thinking in those timeframes, an
> > increase in timestamp resoultion of at least another 10e-3 is
> > likely....
> 
> Is it, though?  To be useful, surely you have to be able to jam quite a few
> instructions into a 1ns block, including memory accesses.
> 
> Rather than providing:
> 
> 	struct timestamp {
> 		__s64 seconds;
> 		__s64 femtoseconds;
> 	};
> 
> which would require 64-bit divisions to get nanosecond timestamps that we do
> actually use, I would lean towards:
> 
> 	struct timestamp {
> 		__s64 seconds;
> 		__s32 nanoseconds;
> 		__s32 femtoseconds;
> 	};

No. Just provide a 64 bit high resoultion field, and define it to
contain nanoseconds. When we need higher resolution to be exported
to userspace, we use a /feature flag/ to indicate that is contains
something like attoseconds or the like.

This also gives userspace a method of choosing the resolution it
wants.....

> 	__s32	stx_btime_ns;	/* File creation time (ns part) */
> 	__s32	stx_ctime_ns;	/* Last attribute change time (ns part) */
> 	__s32	stx_mtime_ns;	/* Last data modification time (ns part) */
> 
> and then add:
> 
> 	__s32	stx_atime_fs;	/* Last access time (fs part) */
> 	__s32	stx_btime_fs;	/* File creation time (fs part) */
> 	__s32	stx_ctime_fs;	/* Last attribute change time (fs part) */
> 	__s32	stx_mtime_fs;	/* Last data modification time (fs part) */
> 
> later.

Which burns a significant part of the spare space in the structure.

> If we *really* do want to allow for atto- or femto- second resolution
> timestamps (and you've still not entirely convinced me that it's going to be
> necessary - the speed of signal propagation still has an ungetroundable
> limit), then we could stick the space in now - but I think it's likely to
> remain dead space.

We don't really care if the strucuture 250 bytes or 300 bytes, it's
not going to have any significant impact on memory usage or
performance. We should just suck it up now and future
proof the timestamp interface, as history has proven repeatedly that
we suck as making our time/timestamp interfaces future proof....

> > > Using the existing FS_*_FL flags as initial values is not worse than
> > > starting with any other arbitrary values for the flags.
> >
> > Except it starts with a sparse set of flags for no good reason.
> 
> Actually, a very good reason.  You can map those flags, on ext4 at least, with
> a load, an AND and an OR.  Three instructions[*].  If the bits don't
> correspond, it gets more expensive (4-5 instructions per bit + 1).

Two words: premature optimisation.

Every other filesystem has to map their own internal flags to the
user interface flags, so why should we make the user interface
harder to understand and maintain just to allow a special snowflake
optimisation for /only one filesystem/?

There's a worse problem than that, though - we can't add certain
flags to the userspace API because the /conflict with ext4 on-disk
flags/ that aren't exported to userspace and so to work around that
we have this shitty hack to define what parts of the flag space
contain flags that the userspace API can interact with:

#define FS_FL_USER_VISIBLE              0x0003DFFF /* User visible flags */
#define FS_FL_USER_MODIFIABLE           0x000380FF /* User modifiable flags */

These mask out the ext2/3/4 flags that should not be visible to
userspace and/or not be modifiable by userspace. Having to maintain
crap like this is a direct result of exporting on-disk flag values
to userspace APIs.

*Let's not repeat the mistakes of the past* in new interfaces.

Cheers,

Dave.

> 
> [*] Leastways, it *should* be three instructions, but gcc fails to optimise it
>     correctly.  I have a bz logged for this.
> 
> David
> 

-- 
Dave Chinner
david@...morbit.com