linux-ext4 - Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6162230b83359d3ed1ee706cc1cb6eacfb12a4f.camel@kernel.org>
Date:   Wed, 18 Oct 2023 16:47:37 -0400
From:   Jeff Layton <jlayton@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        John Stultz <jstultz@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Stephen Boyd <sboyd@...nel.org>,
        Chandan Babu R <chandan.babu@...cle.com>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Dave Chinner <david@...morbit.com>,
        Theodore Ts'o <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
        David Sterba <dsterba@...e.com>,
        Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Amir Goldstein <amir73il@...il.com>, Jan Kara <jack@...e.de>,
        David Howells <dhowells@...hat.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-xfs@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-btrfs@...r.kernel.org, linux-mm@...ck.org,
        linux-nfs@...r.kernel.org
Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain
 timestamp handing

On Wed, 2023-10-18 at 12:18 -0700, Linus Torvalds wrote:
> On Wed, 18 Oct 2023 at 10:41, Jeff Layton <jlayton@...nel.org> wrote:
> > 
> > One way to prevent this is to ensure that when we stamp a file with a
> > fine-grained timestamp, that we use that value to establish a floor for
> > any later timestamp update.
> 
> I'm very leery of this.
> 
> I don't like how it's using a global time - and a global fine-grained
> offset - when different filesystems will very naturally have different
> granularities. I also don't like how it's no using that global lock.
> 
> Yes, yes, since the use of this all is then gated by the 'is_mgtime()'
> thing, any filesystem with big granularities will presumably never set
> FS_MGTIME in the first time, and that hides the worst pointless cases.
> But it still feels iffy to me.
> 

Thanks for taking a look!

I'm not too crazy about the global lock either, but the global fine
grained value ensures that when we have mtime changes that occur across
filesystems that they appear to be in the correct order.

We could (hypothetically) track an offset per superblock or something,
but then you could see out-of-order timestamps in inodes across
different filesystems (even of the same type). I think it'd better not
to do that if we can get away with it.


> Also, the whole current_ctime() logic seems wrong. Later (in 4/9), you do this:
> 
>  static struct timespec64 current_ctime(struct inode *inode)
>  {
>         if (is_mgtime(inode))
>                 return current_mgtime(inode);
> 
> and current_mgtime() does
> 
>         if (nsec & I_CTIME_QUERIED) {
>                 ktime_get_real_ts64(&now);
>                 return timestamp_truncate(now, inode);
>         }
> 
> so once the code has set I_CTIME_QUERIED, it will now use the
> expensive fine-grained time - even when it makes no sense.
> 
> As far as I can tell, there is *never* a reason to get the
> fine-grained time if the old inode ctime is already sufficiently far
> away.
> 
> IOW, my gut feel is that all this logic should always not only be
> guarded by FS_MGTIME (like you do seem to do), *and* by "has anybody
> even queried this time" - it should *also* always be guarded by "if I
> get the coarse-grained time, is that sufficient?"
> 
> So I think the current_ctime() logic should be something along the lines of
> 
>     static struct timespec64 current_ctime(struct inode *inode)
>     {
>         struct timespec64 ts64 = current_time(inode);
>         unsigned long old_ctime_sec = inode->i_ctime_sec;
>         unsigned int old_ctime_nsec = inode->i_ctime_nsec;
> 
>         if (ts64.tv_sec != old_ctime_sec)
>                 return ts64;
> 
>         /*
>          * No need to check is_mgtime(inode) - the I_CTIME_QUERIED
>          * flag is only set for mgtime filesystems
>          */
>         if (!(old_ctime_nsec & I_CTIME_QUERIED))
>                 return ts64;
>         old_ctime_nsec &= ~I_CTIME_QUERIED;
>         if (ts64.tv_nsec > old_ctime_nsec + inode->i_sb->s_time_gran)
>                 return ts64;
> 

Does that really do what you expect here? current_time will return a
value that has already been through timestamp_truncate. Regardless, I
don't think this function makes as big a difference as you might think.

>
>         /* Ok, only *now* do we do a finegrained value */
>         ktime_get_real_ts64(&ts64);
>         return timestamp_truncate(ts64);
>     }
> 
> or whatever. Make it *very* clear that the finegrained timestamp is
> the absolute last option, after we have checked that the regular one
> isn't possible.

current_mgtime is calling ktime_get_real_ts64, which is an existing
interface that does not take the global spinlock and won't advance the
global offset. That call should be quite cheap.

The reason we can use that call here is because current_ctime and
current_mgtime are only called from  inode_needs_update_time, which is
only called to check whether we need to get write access to the
inode. What we do is look at the current clock and see whether the
timestamps would perceivably change if we were to do the update right
then.

If so, we get write access and then call inode_set_ctime_current(). That
will fetch its own timestamps and reevaluate what sort of update to do.
That's the only place that fetches an expensive fine-grained timestamp
that advances the offset.

So, I think this set already is only getting the expensive fine-grained
timestamps as a last resort.

This is probably an indicator that I need to document this code better
though. It may also be a good idea to reorganize
inode_needs_update_time, current_ctime and current_mgtime for better
clarity.

Many thanks for the comments!
-- 
Jeff Layton <jlayton@...nel.org>