linux-kernel - Re: [PATCH v8 0/5] fs: multigrain timestamps for XFS's change

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4b106847d5202aec0e14fdbbe93b070b7ea97477.camel@kernel.org>
Date:   Sat, 23 Sep 2023 06:22:54 -0400
From:   Jeff Layton <jlayton@...nel.org>
To:     Amir Goldstein <amir73il@...il.com>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        Chuck Lever <chuck.lever@...cle.com>,
        Neil Brown <neilb@...e.de>,
        Olga Kornievskaia <kolga@...app.com>,
        Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
        Chandan Babu R <chandan.babu@...cle.com>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Dave Chinner <david@...morbit.com>, Jan Kara <jack@...e.cz>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Kent Overstreet <kent.overstreet@...ux.dev>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-nfs@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [PATCH v8 0/5] fs: multigrain timestamps for XFS's change_cookie

On Sat, 2023-09-23 at 10:15 +0300, Amir Goldstein wrote:
> On Fri, Sep 22, 2023 at 8:15 PM Jeff Layton <jlayton@...nel.org> wrote:
> > 
> > My initial goal was to implement multigrain timestamps on most major
> > filesystems, so we could present them to userland, and use them for
> > NFSv3, etc.
> > 
> > With the current implementation however, we can't guarantee that a file
> > with a coarse grained timestamp modified after one with a fine grained
> > timestamp will always appear to have a later value. This could confuse
> > some programs like make, rsync, find, etc. that depend on strict
> > ordering requirements for timestamps.
> > 
> > The goal of this version is more modest: fix XFS' change attribute.
> > XFS's change attribute is bumped on atime updates in addition to other
> > deliberate changes. This makes it unsuitable for export via nfsd.
> > 
> > Jan Kara suggested keeping this functionality internal-only for now and
> > plumbing the fine grained timestamps through getattr [1]. This set takes
> > a slightly different approach and has XFS use the fine-grained attr to
> > fake up STATX_CHANGE_COOKIE in its getattr routine itself.
> > 
> > While we keep fine-grained timestamps in struct inode, when presenting
> > the timestamps via getattr, we truncate them at a granularity of number
> > of ns per jiffy,
> 
> That's not good, because user explicitly set granular mtime would be
> truncated too and booting with different kernels (HZ) would change
> the observed timestamps of files.
> 

That's a very good point.

> > which allows us to smooth over the fuzz that causes
> > ordering problems.
> > 
> 
> The reported ordering problems (i.e. cp -u) is not even limited to the
> scope of a single fs, right?
> 

It isn't. Most of the tools we're concerned with don't generally care
about filesystem boundaries.

> Thinking out loud - if the QERIED bit was not per inode timestamp
> but instead in a global fs_multigrain_ts variable, then all the inodes
> of all the mgtime fs would be using globally ordered timestamps
>
> That should eliminate the reported issues with time reorder for
> fine vs coarse grained timestamps.
> 
> The risk of extra unneeded "change cookie" updates compared to
> per inode QUERIED bit may exist, but I think it is a rather small overhead
> and maybe worth the tradeoff of having to maintain a real per inode
> "change cookie" in addition to a "globally ordered mgtime"?
> 
> If this idea is acceptable, you may still be able to salvage the reverted
> ctime series for 6.7, because the change to use global mgtime should
> be quite trivial?
> 

This is basically the idea I was going to look at next once I got some
other stuff settled here: Basically, when we apply a fine-grained
timestamp to an inode, we'd advance the coarse-grained clock that
filesystems use to that value.

It could cause some write amplification: if you are streaming writes to
a bunch of files at the same time and someone stats one of them, then
they'd all end up getting an extra inode transaction. That doesn't sound
_too_ bad on its face, but I probably need to implement it and then run
some numbers to see.

-- 
Jeff Layton <jlayton@...nel.org>