linux-kernel - Re: [GIT PULL v2] timestamp fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZQ884uCkKGu6xsDi@mit.edu>
Date:   Sat, 23 Sep 2023 15:30:42 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Amir Goldstein <amir73il@...il.com>,
        Jeff Layton <jlayton@...nel.org>,
        Christian Brauner <brauner@...nel.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Jan Kara <jack@...e.cz>, "Darrick J. Wong" <djwong@...nel.org>
Subject: Re: [GIT PULL v2] timestamp fixes

On Sat, Sep 23, 2023 at 10:48:51AM -0700, Linus Torvalds wrote:
> 
> I feel like 100ns is a much more reasonable resolution, and is quite
> close to a single system call (think "one thousand cycles at 10GHz").

FWIW, UUID's (which originally came from Apollo Domain/OS in the
1980's, before getting adopted by OSF/DCE, and then by Linux and
Microsoft) use a 100ns granularity.  And the smart folks at Apollo
figured this out some 4 decades ago, and *no* they didn't use units of
a single nanosecond.  :-)

100ns granularity is also what what ext4 uses for our on-disk format
--- 2**30 just enough to cover 100ns granularity (with only 7% of
wasted number space), and those two bits are enough for us to encode
timestamps into 2446 using a 64-bit timestamp (and what we do past
2446 is pretty much something I'm happy to let someone else deal with,
as I expect I'll be long dead by then.)

(And if someone does happen to event some kind of life-extension
technology, I'm happy to fix it up... later.  :-)

> That said, we don't have to do powers-of-ten. In fact, in many ways,
> it would probably be a good idea to think of the fractional seconds in
> powers of two. That tends to make it cheaper to do conversions,
> without having to do a full 64-bit divide (a constant divide turns
> into a fancy multiply, but it's still painful on 32-bit
> architectures).

It depends on what conversion we need to do.  If we're converting to
userspace's timespec64 data structure, which is denominated in
nanosecods, it's actually much easier to use decimal 100ns units:

#define EXT4_EPOCH_BITS 2
#define EXT4_EPOCH_MASK ((1 << EXT4_EPOCH_BITS) - 1)
#define EXT4_NSEC_MASK  (~0UL << EXT4_EPOCH_BITS)

static inline __le32 ext4_encode_extra_time(struct timespec64 *time)
{
	u32 extra =((time->tv_sec - (s32)time->tv_sec) >> 32) & EXT4_EPOCH_MASK;
	return cpu_to_le32(extra | (time->tv_nsec << EXT4_EPOCH_BITS));
}

static inline void ext4_decode_extra_time(struct timespec64 *time,
					  __le32 extra)
{
	if (unlikely(extra & cpu_to_le32(EXT4_EPOCH_MASK)))
		time->tv_sec += (u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32;
	time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
}

> Of course, I might have screwed up the above conversion functions,
> they are untested garbage, but they look close enough to being in the
> right ballpark.

We actually have kunit tests for ext4_encode_extra_time() and
ext4_decode_extra_time(), mainly because people *have* screwed it up
when making architecture-specific optimizations or when making global
sweeps of VFS code.  :-)

     	    		     	    	      	  - Ted