[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <44884eeb662c2e304ba644d585b14c65b7dc1a0a.camel@hammerspace.com>
Date: Mon, 12 Sep 2022 15:32:02 +0000
From: Trond Myklebust <trondmy@...merspace.com>
To: "bfields@...ldses.org" <bfields@...ldses.org>
CC: "zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
"djwong@...nel.org" <djwong@...nel.org>,
"brauner@...nel.org" <brauner@...nel.org>,
"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
"neilb@...e.de" <neilb@...e.de>,
"linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
"david@...morbit.com" <david@...morbit.com>,
"fweimer@...hat.com" <fweimer@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"xiubli@...hat.com" <xiubli@...hat.com>,
"chuck.lever@...cle.com" <chuck.lever@...cle.com>,
"jlayton@...nel.org" <jlayton@...nel.org>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"tytso@....edu" <tytso@....edu>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"jack@...e.cz" <jack@...e.cz>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
"linux-man@...r.kernel.org" <linux-man@...r.kernel.org>,
"linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
"lczerner@...hat.com" <lczerner@...hat.com>,
"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
STATX_INO_VERSION field
On Mon, 2022-09-12 at 14:56 +0000, Trond Myklebust wrote:
> On Mon, 2022-09-12 at 10:50 -0400, J. Bruce Fields wrote:
> > On Mon, Sep 12, 2022 at 02:15:16PM +0000, Trond Myklebust wrote:
> > > On Mon, 2022-09-12 at 09:51 -0400, J. Bruce Fields wrote:
> > > > On Mon, Sep 12, 2022 at 08:55:04AM -0400, Jeff Layton wrote:
> > > > > Because of the "seen" flag, we have a 63 bit counter to play
> > > > > with.
> > > > > Could
> > > > > we use a similar scheme to the one we use to handle when
> > > > > "jiffies"
> > > > > wraps? Assume that we'd never compare two values that were
> > > > > more
> > > > > than
> > > > > 2^62 apart? We could add i_version_before/i_version_after
> > > > > macros to
> > > > > make
> > > > > it simple to handle this.
> > > >
> > > > As far as I recall the protocol just assumes it can never
> > > > wrap.
> > > > I
> > > > guess
> > > > you could add a new change_attr_type that works the way you
> > > > describe.
> > > > But without some new protocol clients aren't going to know what
> > > > to do
> > > > with a change attribute that wraps.
> > > >
> > > > I think this just needs to be designed so that wrapping is
> > > > impossible
> > > > in
> > > > any realistic scenario. I feel like that's doable?
> > > >
> > > > If we feel we have to catch that case, the only 100% correct
> > > > behavior
> > > > would probably be to make the filesystem readonly.
> > > >
> > >
> > > Which protocol? If you're talking about basic NFSv4, it doesn't
> > > assume
> > > anything about the change attribute and wrapping.
> > >
> > > The NFSv4.2 protocol did introduce the optional attribute
> > > 'change_attr_type' that tries to describe the change attribute
> > > behaviour to the client. It tells you if the behaviour is
> > > monotonically
> > > increasing, but doesn't say anything about the behaviour when the
> > > attribute value overflows.
> > >
> > > That said, the Linux NFSv4.2 client, which uses that
> > > change_attr_type
> > > attribute does deal with overflow by assuming standard uint64_t
> > > wrap
> > > around rules. i.e. it assumes bit values > 63 are truncated,
> > > meaning
> > > that the value obtained by incrementing (2^64-1) is 0.
> >
> > Yeah, it was the MONOTONIC_INCRE case I was thinking of. That's
> > interesting, I didn't know the client did that.
> >
>
> If you look at where we compare version numbers, it is always some
> variant of the following:
>
> static int nfs_inode_attrs_cmp_monotonic(const struct nfs_fattr
> *fattr,
> const struct inode *inode)
> {
> s64 diff = fattr->change_attr -
> inode_peek_iversion_raw(inode);
> if (diff > 0)
> return 1;
> return diff == 0 ? 0 : -1;
> }
>
> i.e. we do an unsigned 64-bit subtraction, and then cast it to the
> signed 64-bit equivalent in order to figure out which is the more
> recent value.
>
...and by the way, yes this does mean that if you suddenly add a value
of 2^63 to the change attribute, then you are likely to cause the
client to think that you just handed it an old value.
i.e. you're better off having the crash counter increment the change
attribute by a relatively small value. One that is guaranteed to be
larger than the values that may have been lost, but that is not
excessively large.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com
Powered by blists - more mailing lists