lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 30 Aug 2022 18:25:12 +0000
From:   Trond Myklebust <trondmy@...merspace.com>
To:     "bfields@...ldses.org" <bfields@...ldses.org>,
        "jlayton@...nel.org" <jlayton@...nel.org>
CC:     "zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "xiubli@...hat.com" <xiubli@...hat.com>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "neilb@...e.de" <neilb@...e.de>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "david@...morbit.com" <david@...morbit.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "chuck.lever@...cle.com" <chuck.lever@...cle.com>,
        "linux-ceph@...r.kernel.org" <linux-ceph@...r.kernel.org>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "tytso@....edu" <tytso@....edu>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "jack@...e.cz" <jack@...e.cz>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "lczerner@...hat.com" <lczerner@...hat.com>,
        "adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
        "walters@...bum.org" <walters@...bum.org>,
        "linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>
Subject: Re: [PATCH v3 1/7] iversion: update comments with info about atime
 updates

On Tue, 2022-08-30 at 13:53 -0400, Jeff Layton wrote:
> On Tue, 2022-08-30 at 17:47 +0000, Trond Myklebust wrote:
> > On Tue, 2022-08-30 at 13:02 -0400, Jeff Layton wrote:
> > > On Tue, 2022-08-30 at 15:43 +0000, Trond Myklebust wrote:
> > > > On Tue, 2022-08-30 at 11:17 -0400, J. Bruce Fields wrote:
> > > > > On Tue, Aug 30, 2022 at 02:58:27PM +0000, Trond Myklebust
> > > > > wrote:
> > > > > > On Tue, 2022-08-30 at 10:44 -0400, J. Bruce Fields wrote:
> > > > > > > On Tue, Aug 30, 2022 at 09:50:02AM -0400, Jeff Layton
> > > > > > > wrote:
> > > > > > > > On Tue, 2022-08-30 at 09:24 -0400, J. Bruce Fields
> > > > > > > > wrote:
> > > > > > > > > On Tue, Aug 30, 2022 at 07:40:02AM -0400, Jeff Layton
> > > > > > > > > wrote:
> > > > > > > > > > Yes, saying only that it must be different is
> > > > > > > > > > intentional.
> > > > > > > > > > What
> > > > > > > > > > we
> > > > > > > > > > really want is for consumers to treat this as an
> > > > > > > > > > opaque
> > > > > > > > > > value
> > > > > > > > > > for the
> > > > > > > > > > most part [1]. Therefore an implementation based on
> > > > > > > > > > hashing
> > > > > > > > > > would
> > > > > > > > > > conform to the spec, I'd think, as long as all of
> > > > > > > > > > the
> > > > > > > > > > relevant
> > > > > > > > > > info is
> > > > > > > > > > part of the hash.
> > > > > > > > > 
> > > > > > > > > It'd conform, but it might not be as useful as an
> > > > > > > > > increasing
> > > > > > > > > value.
> > > > > > > > > 
> > > > > > > > > E.g. a client can use that to work out which of a
> > > > > > > > > series
> > > > > > > > > of
> > > > > > > > > reordered
> > > > > > > > > write replies is the most recent, and I seem to
> > > > > > > > > recall
> > > > > > > > > that
> > > > > > > > > can
> > > > > > > > > prevent
> > > > > > > > > unnecessary invalidations in some cases.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > That's a good point; the linux client does this. That
> > > > > > > > said,
> > > > > > > > NFSv4
> > > > > > > > has a
> > > > > > > > way for the server to advertise its change attribute
> > > > > > > > behavior
> > > > > > > > [1]
> > > > > > > > (though nfsd hasn't implemented this yet).
> > > > > > > 
> > > > > > > It was implemented and reverted.  The issue was that I
> > > > > > > thought
> > > > > > > nfsd
> > > > > > > should mix in the ctime to prevent the change attribute
> > > > > > > going
> > > > > > > backwards
> > > > > > > on reboot (see fs/nfsd/nfsfh.h:nfsd4_change_attribute()),
> > > > > > > but
> > > > > > > Trond
> > > > > > > was
> > > > > > > concerned about the possibility of time going backwards. 
> > > > > > > See
> > > > > > > 1631087ba872 "Revert "nfsd4: support change_attr_type
> > > > > > > attribute"".
> > > > > > > There's some mailing list discussion to that I'm not
> > > > > > > turning
> > > > > > > up
> > > > > > > right
> > > > > > > now.
> > > > > 
> > > > > https://lore.kernel.org/linux-nfs/a6294c25cb5eb98193f609a52aa8f4b5d4e81279.camel@hammerspace.com/
> > > > > is what I was thinking of but it isn't actually that
> > > > > interesting.
> > > > > 
> > > > > > My main concern was that some filesystems (e.g. ext3) were
> > > > > > failing
> > > > > > to
> > > > > > provide sufficient timestamp resolution to actually label
> > > > > > the
> > > > > > resulting
> > > > > > 'change attribute' as being updated monotonically. If the
> > > > > > time
> > > > > > stamp
> > > > > > doesn't change when the file data or metadata are changed,
> > > > > > then
> > > > > > the
> > > > > > client has to perform extra checks to try to figure out
> > > > > > whether
> > > > > > or
> > > > > > not
> > > > > > its caches are up to date.
> > > > > 
> > > > > That's a different issue from the one you were raising in
> > > > > that
> > > > > discussion.
> > > > > 
> > > > > > > Did NFSv4 add change_attr_type because some
> > > > > > > implementations
> > > > > > > needed
> > > > > > > the
> > > > > > > unordered case, or because they realized ordering was
> > > > > > > useful
> > > > > > > but
> > > > > > > wanted
> > > > > > > to keep backwards compatibility?  I don't know which it
> > > > > > > was.
> > > > > > 
> > > > > > We implemented it because, as implied above, knowledge of
> > > > > > whether
> > > > > > or
> > > > > > not the change attribute behaves monotonically, or strictly
> > > > > > monotonically, enables a number of optimisations.
> > > > > 
> > > > > Of course, but my question was about the value of the old
> > > > > behavior,
> > > > > not
> > > > > about the value of the monotonic behavior.
> > > > > 
> > > > > Put differently, if we could redesign the protocol from
> > > > > scratch
> > > > > would
> > > > > we
> > > > > actually have included the option of non-monotonic behavior?
> > > > > 
> > > > 
> > > > If we could design the filesystems from scratch, we probably
> > > > would
> > > > not.
> > > > The protocol ended up being as it is because people were trying
> > > > to
> > > > make
> > > > it as easy to implement as possible.
> > > > 
> > > > So if we could design the filesystem from scratch, we would
> > > > have
> > > > probably designed it along the lines of what AFS does.
> > > > i.e. each explicit change is accompanied by a single bump of
> > > > the
> > > > change
> > > > attribute, so that the clients can not only decide the order of
> > > > the
> > > > resulting changes, but also if they have missed a change (that
> > > > might
> > > > have been made by a different client).
> > > > 
> > > > However that would be a requirement that is likely to be very
> > > > specific
> > > > to distributed caches (and hence distributed filesystems). I
> > > > doubt
> > > > there are many user space applications that would need that
> > > > high
> > > > precision. Maybe MPI, but that's the only candidate I can think
> > > > of
> > > > for
> > > > now?
> > > > 
> > > 
> > > The fact that NFS kept this more loosely-defined is what allowed
> > > us
> > > to
> > > elide some of the i_version bumps and regain a fair bit of
> > > performance
> > > for local filesystems [1]. If the change attribute had been more
> > > strictly defined like you mention, then that particular
> > > optimization
> > > would not have been possible.
> > > 
> > > This sort of thing is why I'm a fan of not defining this any more
> > > strictly than we require. Later on, maybe we'll come up with a
> > > way
> > > for
> > > filesystems to advertise that they can offer stronger guarantees.
> > 
> > What 'eliding of the bumps' are we talking about here? If it
> > results in
> > unreliable behaviour, then I propose we just drop the whole concept
> > and
> > go back to using the ctime. The change attribute is only useful if
> > it
> > results in a reliable mechanism for detecting changes. Once you
> > "elide
> > away" the word "reliable", then it has no value beyond what ctime
> > already does.
> > 
> 
> I'm talking about the scheme to optimize away i_version updates when
> the
> current one has never been queried:
> 
>    
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f02a9ad1f15d
> 
> There's nothing unreliable about it.

Not really seeing why that would be incompatible with the idea of
bumping on every change. The I_VERSION_QUERIED is just a hint to tell
you that at the very least you need to sync the next metadata update
after someone peeked at the value. You could still continue to cache
updates after that, and only sync them once a O_SYNC or an fsync() call
explicitly requires you to do so.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ