lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0d006954b698cb1cea3a93c1662b5913a0ded3b1.camel@kernel.org>
Date:   Thu, 21 Sep 2023 14:51:53 -0400
From:   Jeff Layton <jlayton@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Christian Brauner <brauner@...nel.org>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Jan Kara <jack@...e.cz>
Subject: Re: [GIT PULL v2] timestamp fixes

On Thu, 2023-09-21 at 11:24 -0700, Linus Torvalds wrote:
> On Thu, 21 Sept 2023 at 04:21, Christian Brauner <brauner@...nel.org> wrote:
> > 
> >   git@...olite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.6-rc3.vfs.ctime.revert
> 
> So for some reason pr-tracker-bot doesn't seem to have reacted to this
> pull request, but it's in my tree now.
> 
> I *do* have one reaction to all of this: now that you have made
> "i_ctime" be something that cannot be accessed directly (and renamed
> it to "__i_ctime"), would you mind horribly going all the way, and do
> the same for i_atime and i_mtime too?
> 
> The reason I ask is that I *really* despise "struct timespec64" as a type.
> 
> I despise it inherently, but I despise it even more when you really
> use it as another type entirely, and are hiding bits in there.
> 
> I despise it because "tv_sec" obviously needs to be 64-bit, but then
> "tv_nsec" is this horrible abomination. It's defined as "long", which
> is all kinds of crazy. It's bogus and historical.
> 
> And it's wasteful.
> 
> Now, in the case of i_ctime, you took advantage of that waste by using
> one (of the potentially 2..34!) unused bits for that
> "fine-granularity" flag.
> 
> But even when you do that, there's up to 33 other bits just lying
> around, wasting space in a very central data structure.
> 
> So it would actually be much better to explode the 'struct timespec64'
> into explicit 64-bit seconds field, and an explicit 32-bit field with
> two bits reserved. And not even next to each other, because they don't
> pack well in general.
> 
> So instead of
> 
>         struct timespec64       i_atime;
>         struct timespec64       i_mtime;
>         struct timespec64       __i_ctime;
> 
> where that last one needs accessors to access, just make them *all*
> have helper accessors, and make it be
> 
>         u64  i_atime_sec;
>         u64  i_mtime_sec;
>         u64  i_ctime_sec;
>         u32  i_atime_nsec;
>         u32  i_mtime_nsec;
>         u32  i_ctime_nsec;
> 
> and now that 'struct inode' should shrink by 12 bytes.
> 

I like it.

> Then do this:
> 
>   #define inode_time(x) \
>        (struct timespec64) { x##_sec, x##_nsec }
> 
> and you can now create a timespec64 by just doing
> 
>     inode_time(inode->i_atime)
> 
> or something like that (to help create those accessor functions).
> 
> Now, I agree that 12 bytes in the disaster that is 'struct inode' is
> likely a drop in the ocean. We have tons of big things in there (ie
> several list_heads, a whole 'struct address_space' etc etc), so it's
> only twelve bytes out of hundreds.
> 
> But dammit, that 'timespec64' really is ugly, and now that you're
> hiding bits in it and it's no longer *really* a 'timespec64', I feel
> like it would be better to do it right, and not mis-use a type that
> has other semantics, and has other problems.
> 


We have many, many inodes though, and 12 bytes per adds up!

I'm on board with the idea, but...that's likely to be as big a patch
series as the ctime overhaul was. In fact, it'll touch a lot of the same
code. I can take a stab at that in the near future though.

Since we're on the subject...another thing that bothers me with all of
the timestamp handling is that we don't currently try to mitigate "torn
reads" across the two different words. It seems like you could fetch a
tv_sec value and then get a tv_nsec value that represents an entirely
different timestamp if there are stores between them.

Should we be concerned about this? I suppose we could do something with
a seqlock, though I'd really want to avoid locking on the write side. 
-- 
Jeff Layton <jlayton@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ