[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b106847d5202aec0e14fdbbe93b070b7ea97477.camel@kernel.org>
Date: Sat, 23 Sep 2023 06:22:54 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Amir Goldstein <amir73il@...il.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
Chuck Lever <chuck.lever@...cle.com>,
Neil Brown <neilb@...e.de>,
Olga Kornievskaia <kolga@...app.com>,
Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
Chandan Babu R <chandan.babu@...cle.com>,
"Darrick J. Wong" <djwong@...nel.org>,
Dave Chinner <david@...morbit.com>, Jan Kara <jack@...e.cz>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Kent Overstreet <kent.overstreet@...ux.dev>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-nfs@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [PATCH v8 0/5] fs: multigrain timestamps for XFS's change_cookie
On Sat, 2023-09-23 at 10:15 +0300, Amir Goldstein wrote:
> On Fri, Sep 22, 2023 at 8:15 PM Jeff Layton <jlayton@...nel.org> wrote:
> >
> > My initial goal was to implement multigrain timestamps on most major
> > filesystems, so we could present them to userland, and use them for
> > NFSv3, etc.
> >
> > With the current implementation however, we can't guarantee that a file
> > with a coarse grained timestamp modified after one with a fine grained
> > timestamp will always appear to have a later value. This could confuse
> > some programs like make, rsync, find, etc. that depend on strict
> > ordering requirements for timestamps.
> >
> > The goal of this version is more modest: fix XFS' change attribute.
> > XFS's change attribute is bumped on atime updates in addition to other
> > deliberate changes. This makes it unsuitable for export via nfsd.
> >
> > Jan Kara suggested keeping this functionality internal-only for now and
> > plumbing the fine grained timestamps through getattr [1]. This set takes
> > a slightly different approach and has XFS use the fine-grained attr to
> > fake up STATX_CHANGE_COOKIE in its getattr routine itself.
> >
> > While we keep fine-grained timestamps in struct inode, when presenting
> > the timestamps via getattr, we truncate them at a granularity of number
> > of ns per jiffy,
>
> That's not good, because user explicitly set granular mtime would be
> truncated too and booting with different kernels (HZ) would change
> the observed timestamps of files.
>
That's a very good point.
> > which allows us to smooth over the fuzz that causes
> > ordering problems.
> >
>
> The reported ordering problems (i.e. cp -u) is not even limited to the
> scope of a single fs, right?
>
It isn't. Most of the tools we're concerned with don't generally care
about filesystem boundaries.
> Thinking out loud - if the QERIED bit was not per inode timestamp
> but instead in a global fs_multigrain_ts variable, then all the inodes
> of all the mgtime fs would be using globally ordered timestamps
>
> That should eliminate the reported issues with time reorder for
> fine vs coarse grained timestamps.
>
> The risk of extra unneeded "change cookie" updates compared to
> per inode QUERIED bit may exist, but I think it is a rather small overhead
> and maybe worth the tradeoff of having to maintain a real per inode
> "change cookie" in addition to a "globally ordered mgtime"?
>
> If this idea is acceptable, you may still be able to salvage the reverted
> ctime series for 6.7, because the change to use global mgtime should
> be quite trivial?
>
This is basically the idea I was going to look at next once I got some
other stuff settled here: Basically, when we apply a fine-grained
timestamp to an inode, we'd advance the coarse-grained clock that
filesystems use to that value.
It could cause some write amplification: if you are streaming writes to
a bunch of files at the same time and someone stats one of them, then
they'd all end up getting an extra inode transaction. That doesn't sound
_too_ bad on its face, but I probably need to implement it and then run
some numbers to see.
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists