linux-kernel - Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <577b6d8a7243aeee37eaa4bbb00c90799586bc48.camel@hammerspace.com>
Date:   Thu, 15 Sep 2022 15:08:34 +0000
From:   Trond Myklebust <trondmy@...merspace.com>
To:     "bfields@...ldses.org" <bfields@...ldses.org>,
        "neilb@...e.de" <neilb@...e.de>
CC:     "zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "xiubli@...hat.com" <xiubli@...hat.com>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "david@...morbit.com" <david@...morbit.com>,
        "fweimer@...hat.com" <fweimer@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jlayton@...nel.org" <jlayton@...nel.org>,
        "chuck.lever@...cle.com" <chuck.lever@...cle.com>,
        "linux-man@...r.kernel.org" <linux-man@...r.kernel.org>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "tytso@....edu" <tytso@....edu>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "jack@...e.cz" <jack@...e.cz>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
        "lczerner@...hat.com" <lczerner@...hat.com>,
        "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
 STATX_INO_VERSION field

On Thu, 2022-09-15 at 10:06 -0400, J. Bruce Fields wrote:
> On Tue, Sep 13, 2022 at 09:14:32AM +1000, NeilBrown wrote:
> > On Mon, 12 Sep 2022, J. Bruce Fields wrote:
> > > On Sun, Sep 11, 2022 at 08:13:11AM +1000, NeilBrown wrote:
> > > > On Fri, 09 Sep 2022, Jeff Layton wrote:
> > > > > 
> > > > > The machine crashes and comes back up, and we get a query for
> > > > > i_version
> > > > > and it comes back as X. Fine, it's an old version. Now there
> > > > > is a write.
> > > > > What do we do to ensure that the new value doesn't collide
> > > > > with X+1? 
> > > > 
> > > > (I missed this bit in my earlier reply..)
> > > > 
> > > > How is it "Fine" to see an old version?
> > > > The file could have changed without the version changing.
> > > > And I thought one of the goals of the crash-count was to be
> > > > able to
> > > > provide a monotonic change id.
> > > 
> > > I was still mainly thinking about how to provide reliable close-
> > > to-open
> > > semantics between NFS clients.  In the case the writer was an NFS
> > > client, it wasn't done writing (or it would have COMMITted), so
> > > those
> > > writes will come in and bump the change attribute soon, and as
> > > long as
> > > we avoid the small chance of reusing an old change attribute,
> > > we're OK,
> > > and I think it'd even still be OK to advertise
> > > CHANGE_TYPE_IS_MONOTONIC_INCR.
> > 
> > You seem to be assuming that the client doesn't crash at the same
> > time
> > as the server (maybe they are both VMs on a host that lost
> > power...)
> > 
> > If client A reads and caches, client B writes, the server crashes
> > after
> > writing some data (to already allocated space so no inode update
> > needed)
> > but before writing the new i_version, then client B crashes.
> > When server comes back the i_version will be unchanged but the data
> > has
> > changed.  Client A will cache old data indefinitely...
> 
> I guess I assume that if all we're promising is close-to-open, then a
> client isn't allowed to trust its cache in that situation.  Maybe
> that's
> an overly draconian interpretation of close-to-open.
> 
> Also, I'm trying to think about how to improve things incrementally.
> Incorporating something like a crash count into the on-disk i_version
> fixes some cases without introducing any new ones or regressing
> performance after a crash.
> 
> If we subsequently wanted to close those remaining holes, I think
> we'd
> need the change attribute increment to be seen as atomic with respect
> to
> its associated change, both to clients and (separately) on disk. 
> (That
> would still allow the change attribute to go backwards after a crash,
> to
> the value it held as of the on-disk state of the file.  I think
> clients
> should be able to deal with that case.)
> 
> But, I don't know, maybe a bigger hammer would be OK:
> 

If you're not going to meet the minimum bar of data integrity, then
this whole exercise is just a massive waste of everyone's time. The
answer then going forward is just to recommend never using Linux as an
NFS server. Makes my life much easier, because I no longer have to
debug any of the issues.

> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com