linux-kernel - Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f50919004f95782f0e8f26d9ac0513ee0c7ee432.camel@kernel.org>
Date:   Mon, 12 Sep 2022 11:49:33 -0400
From:   Jeff Layton <jlayton@...nel.org>
To:     Trond Myklebust <trondmy@...merspace.com>,
        "bfields@...ldses.org" <bfields@...ldses.org>
Cc:     "zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "neilb@...e.de" <neilb@...e.de>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "david@...morbit.com" <david@...morbit.com>,
        "fweimer@...hat.com" <fweimer@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "xiubli@...hat.com" <xiubli@...hat.com>,
        "chuck.lever@...cle.com" <chuck.lever@...cle.com>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "tytso@....edu" <tytso@....edu>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "jack@...e.cz" <jack@...e.cz>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "linux-man@...r.kernel.org" <linux-man@...r.kernel.org>,
        "linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
        "lczerner@...hat.com" <lczerner@...hat.com>,
        "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
 STATX_INO_VERSION field

On Mon, 2022-09-12 at 15:32 +0000, Trond Myklebust wrote:
> On Mon, 2022-09-12 at 14:56 +0000, Trond Myklebust wrote:
> > On Mon, 2022-09-12 at 10:50 -0400, J. Bruce Fields wrote:
> > > On Mon, Sep 12, 2022 at 02:15:16PM +0000, Trond Myklebust wrote:
> > > > On Mon, 2022-09-12 at 09:51 -0400, J. Bruce Fields wrote:
> > > > > On Mon, Sep 12, 2022 at 08:55:04AM -0400, Jeff Layton wrote:
> > > > > > Because of the "seen" flag, we have a 63 bit counter to play
> > > > > > with.
> > > > > > Could
> > > > > > we use a similar scheme to the one we use to handle when
> > > > > > "jiffies"
> > > > > > wraps? Assume that we'd never compare two values that were
> > > > > > more
> > > > > > than
> > > > > > 2^62 apart? We could add i_version_before/i_version_after
> > > > > > macros to
> > > > > > make
> > > > > > it simple to handle this.
> > > > > 
> > > > > As far as I recall the protocol just assumes it can never
> > > > > wrap. 
> > > > > I
> > > > > guess
> > > > > you could add a new change_attr_type that works the way you
> > > > > describe.
> > > > > But without some new protocol clients aren't going to know what
> > > > > to do
> > > > > with a change attribute that wraps.
> > > > > 
> > > > > I think this just needs to be designed so that wrapping is
> > > > > impossible
> > > > > in
> > > > > any realistic scenario.  I feel like that's doable?
> > > > > 
> > > > > If we feel we have to catch that case, the only 100% correct
> > > > > behavior
> > > > > would probably be to make the filesystem readonly.
> > > > > 
> > > > 
> > > > Which protocol? If you're talking about basic NFSv4, it doesn't
> > > > assume
> > > > anything about the change attribute and wrapping.
> > > > 
> > > > The NFSv4.2 protocol did introduce the optional attribute
> > > > 'change_attr_type' that tries to describe the change attribute
> > > > behaviour to the client. It tells you if the behaviour is
> > > > monotonically
> > > > increasing, but doesn't say anything about the behaviour when the
> > > > attribute value overflows.
> > > > 
> > > > That said, the Linux NFSv4.2 client, which uses that
> > > > change_attr_type
> > > > attribute does deal with overflow by assuming standard uint64_t
> > > > wrap
> > > > around rules. i.e. it assumes bit values > 63 are truncated,
> > > > meaning
> > > > that the value obtained by incrementing (2^64-1) is 0.
> > > 
> > > Yeah, it was the MONOTONIC_INCRE case I was thinking of.  That's
> > > interesting, I didn't know the client did that.
> > > 
> > 
> > If you look at where we compare version numbers, it is always some
> > variant of the following:
> > 
> > static int nfs_inode_attrs_cmp_monotonic(const struct nfs_fattr
> > *fattr,
> >                                          const struct inode *inode)
> > {
> >         s64 diff = fattr->change_attr -
> > inode_peek_iversion_raw(inode);
> >         if (diff > 0)
> >                 return 1;
> >         return diff == 0 ? 0 : -1;
> > }
> > 
> > i.e. we do an unsigned 64-bit subtraction, and then cast it to the
> > signed 64-bit equivalent in order to figure out which is the more
> > recent value.
> > 

Good! This seems like the reasonable thing to do, given that the spec
doesn't really say that the change attribute has to start at low values.

> 
> ...and by the way, yes this does mean that if you suddenly add a value
> of 2^63 to the change attribute, then you are likely to cause the
> client to think that you just handed it an old value.
> 
> i.e. you're better off having the crash counter increment the change
> attribute by a relatively small value. One that is guaranteed to be
> larger than the values that may have been lost, but that is not
> excessively large.
> 

Yeah.

Like with jiffies, you need to make sure the samples you're comparing
aren't _too_ far off. That should be doable here -- 62 bits is plenty of
room to store a lot of change values.

My benchmark (maybe wrong, but maybe good enough) is to figure on an
increment per nanosecond for a worst-case scenario. With that, 2^40
nanoseconds is >12 days. Maybe that's overkill.

2^32 ns is about an hour and 20 mins. That's probably a reasonable value
to use. If we can't get a a new value onto disk in that time then
something is probably very wrong.
-- 
Jeff Layton <jlayton@...nel.org>