[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wg4jyTxO8WWUc1quqSETGaVsPHh8UeFUROYNwU-fEbkJg@mail.gmail.com>
Date: Mon, 30 Oct 2023 13:11:56 -1000
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Dave Chinner <david@...morbit.com>
Cc: Jeff Layton <jlayton@...nel.org>,
Amir Goldstein <amir73il@...il.com>,
Kent Overstreet <kent.overstreet@...ux.dev>,
Christian Brauner <brauner@...nel.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
John Stultz <jstultz@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Stephen Boyd <sboyd@...nel.org>,
Chandan Babu R <chandan.babu@...cle.com>,
"Darrick J. Wong" <djwong@...nel.org>,
"Theodore Ts'o" <tytso@....edu>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
David Sterba <dsterba@...e.com>,
Hugh Dickins <hughd@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Jan Kara <jack@...e.de>, David Howells <dhowells@...hat.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-xfs@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-btrfs@...r.kernel.org, linux-mm@...ck.org,
linux-nfs@...r.kernel.org
Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain
timestamp handing
On Mon, 30 Oct 2023 at 12:37, Dave Chinner <david@...morbit.com> wrote:
>
> If XFS can ignore relatime or lazytime persistent updates for given
> situations, then *we don't need to make periodic on-disk updates of
> atime*. This makes the whole problem of "persistent atime update bumps
> i_version" go away because then we *aren't making persistent atime
> updates* except when some other persistent modification that bumps
> [cm]time occurs.
Well, I think this should be split into two independent questions:
(a) are relatime or lazytime atime updates persistent if nothing else changes?
(b) do atime updates _ever_ update i_version *regardless* of relatime
or lazytime?
and honestly, I think the best answer to (b) would be that "no,
i_version should simply not change for atime updates". And I think
that answer is what it is because no user of i_version seems to want
it.
Now, the reason it's a single question for you is that apparently for
XFS, the only thing that matters is "inode was written to disk" and
that "di_changecount" value is thus related to the persistence of
atime updates, but splitting di_changecount out to be a separate thing
from i_version seems to be on the table, so I think those two things
really could be independent issues.
> But I don't want to do this unconditionally - for systems not
> running anything that samples i_version we want relatime/lazytime
> to behave as they are supposed to and do periodic persistent updates
> as per normal. Principle of least surprise and all that jazz.
Well - see above: I think in a perfect world, we'd simply never change
i_version at all for any atime updates, and relatime/lazytime simply
wouldn't be an issue at all wrt i_version.
Wouldn't _that_ be the trule "least surprising" behavior? Considering
that nobody wants i_version to change for what are otherwise pure
reads (that's kind of the *definition* of atime, after all).
Now, the annoyance here is that *both* (a) and (b) then have that
impact of "i_version no longer tracks di_changecount".
So I don't think the issue here is "i_version" per se. I think in a
vacuum, the best option of i_version is pretty obvious. But if you
want i_version to track di_changecount, *then* you end up with that
situation where the persistence of atime matters, and i_version needs
to update whenever a (persistent) atime update happens.
> So we really need an indication for inodes that we should enable this
> mode for the inode. I have asked if we can have per-operation
> context flag to trigger this given the needs for io_uring to have
> context flags for timestamp updates to be added.
I really think some kind of new and even *more* complex and
non-intuitive behavior is the worst of both worlds. Having atime
updates be conditionally persistent - on top of already being delayed
by lazytime/relatime - and having the persistence magically change
depending on whether something wants to get i_version updates - sounds
just completely crazy.
Particularly as *none* of the people who want i_version updates
actually want them for atime at all.
So I really think this all boils down to "is xfs really willing to
split bi_changecount from i_version"?
> I have asked if we can have an inode flag set by the VFS or
> application code for this. e.g. a flag set by nfsd whenever it accesses a
> given inode.
>
> I have asked if this inode flag can just be triggered if we ever see
> I_VERSION_QUERIED set or statx is used to retrieve a change cookie,
> and whether this is a reliable mechanism for setting such a flag.
See above: linking this to I_VERSION_QUERIED is horrific. The people
who set that bit do *NOT* want atime updates to change i_version under
any circumstances. It was always a mistake.
This really is all *entirely* an artifact of that "bi_changecount" vs
"i_version" being tied together. You did seem to imply that you'd be
ok with having "bi_changecount" be split from i_version, ie from an
earlier email in this thread:
"Now that NFS is using a proper abstraction (i.e. vfs_statx()) to get
the change cookie, we really don't need to expose di_changecount in
i_version at all - we could simply copy an internal di_changecount
value into the statx cookie field in xfs_vn_getattr() and there
would be almost no change of behaviour from the perspective of NFS
and IMA at all"
but while I suspect *that* part is easy and straightforward, the
problem then becomes one of "what about the persistence of i_version",
and then you'd need a new field for *that* anyway, and would want a
new on-disk format regardless.
Linus
Powered by blists - more mailing lists