lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ddbc23661ab6527d73860a873391a3536451ee6.camel@hammerspace.com>
Date:   Wed, 7 Sep 2022 15:04:18 +0000
From:   Trond Myklebust <trondmy@...merspace.com>
To:     "bfields@...ldses.org" <bfields@...ldses.org>,
        "jlayton@...nel.org" <jlayton@...nel.org>
CC:     "zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "xiubli@...hat.com" <xiubli@...hat.com>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "neilb@...e.de" <neilb@...e.de>,
        "david@...morbit.com" <david@...morbit.com>,
        "fweimer@...hat.com" <fweimer@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "chuck.lever@...cle.com" <chuck.lever@...cle.com>,
        "linux-man@...r.kernel.org" <linux-man@...r.kernel.org>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "tytso@....edu" <tytso@....edu>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "jack@...e.cz" <jack@...e.cz>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "lczerner@...hat.com" <lczerner@...hat.com>,
        "adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
        "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
 STATX_INO_VERSION field

On Wed, 2022-09-07 at 10:05 -0400, Jeff Layton wrote:
> On Wed, 2022-09-07 at 13:55 +0000, Trond Myklebust wrote:
> > On Wed, 2022-09-07 at 09:12 -0400, Jeff Layton wrote:
> > > On Wed, 2022-09-07 at 08:52 -0400, J. Bruce Fields wrote:
> > > > On Wed, Sep 07, 2022 at 08:47:20AM -0400, Jeff Layton wrote:
> > > > > On Wed, 2022-09-07 at 21:37 +1000, NeilBrown wrote:
> > > > > > On Wed, 07 Sep 2022, Jeff Layton wrote:
> > > > > > > +The change to \fIstatx.stx_ino_version\fP is not atomic
> > > > > > > with
> > > > > > > respect to the
> > > > > > > +other changes in the inode. On a write, for instance,
> > > > > > > the
> > > > > > > i_version it usually
> > > > > > > +incremented before the data is copied into the
> > > > > > > pagecache.
> > > > > > > Therefore it is
> > > > > > > +possible to see a new i_version value while a read still
> > > > > > > shows the old data.
> > > > > > 
> > > > > > Doesn't that make the value useless?
> > > > > > 
> > > > > 
> > > > > No, I don't think so. It's only really useful for comparing
> > > > > to an
> > > > > older
> > > > > sample anyway. If you do "statx; read; statx" and the value
> > > > > hasn't
> > > > > changed, then you know that things are stable. 
> > > > 
> > > > I don't see how that helps.  It's still possible to get:
> > > > 
> > > >                 reader          writer
> > > >                 ------          ------
> > > >                                 i_version++
> > > >                 statx
> > > >                 read
> > > >                 statx
> > > >                                 update page cache
> > > > 
> > > > right?
> > > > 
> > > 
> > > Yeah, I suppose so -- the statx wouldn't necessitate any locking.
> > > In
> > > that case, maybe this is useless then other than for testing
> > > purposes
> > > and userland NFS servers.
> > > 
> > > Would it be better to not consume a statx field with this if so?
> > > What
> > > could we use as an alternate interface? ioctl? Some sort of
> > > global
> > > virtual xattr? It does need to be something per-inode.
> > 
> > I don't see how a non-atomic change attribute is remotely useful
> > even
> > for NFS.
> > 
> > The main problem is not so much the above (although NFS clients are
> > vulnerable to that too) but the behaviour w.r.t. directory changes.
> > 
> > If the server can't guarantee that file/directory/... creation and
> > unlink are atomically recorded with change attribute updates, then
> > the
> > client has to always assume that the server is lying, and that it
> > has
> > to revalidate all its caches anyway. Cue endless
> > readdir/lookup/getattr
> > requests after each and every directory modification in order to
> > check
> > that some other client didn't also sneak in a change of their own.
> > 
> 
> We generally hold the parent dir's inode->i_rwsem exclusively over
> most
> important directory changes, and the times/i_version are also updated
> while holding it. What we don't do is serialize reads of this value
> vs.
> the i_rwsem, so you could see new directory contents alongside an old
> i_version. Maybe we should be taking it for read when we query it on
> a
> directory?

Serialising reads is not the problem. The problem is ensuring that
knfsd is able to provide an atomic change_info4 structure when the
client modifies the directory.
i.e. the requirement is that if the directory changed, then that
modification is atomically accompanied by an update of the change
attribute that can be retrieved by knfsd and placed in the reply to the
client.

> Achieving atomicity with file writes though is another matter
> entirely.
> I'm not sure that's even doable or how to approach it if so.
> Suggestions?

The problem outlined by Bruce above isn't a big deal. Just check the
I_VERSION_QUERIED flag after the 'update_page_cache' bit, and bump the
i_version if that's the case. The real problem is what happens if you
then crash during writeback...

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ