lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2d41c08e1fd96d55c794c3b4cd43a51a0494bfcf.camel@hammerspace.com>
Date:   Fri, 23 Sep 2022 13:44:05 +0000
From:   Trond Myklebust <trondmy@...merspace.com>
To:     "jack@...e.cz" <jack@...e.cz>,
        "jlayton@...nel.org" <jlayton@...nel.org>
CC:     "zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "bfields@...ldses.org" <bfields@...ldses.org>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "neilb@...e.de" <neilb@...e.de>,
        "david@...morbit.com" <david@...morbit.com>,
        "fweimer@...hat.com" <fweimer@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "chuck.lever@...cle.com" <chuck.lever@...cle.com>,
        "linux-man@...r.kernel.org" <linux-man@...r.kernel.org>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "tytso@....edu" <tytso@....edu>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "xiubli@...hat.com" <xiubli@...hat.com>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
        "lczerner@...hat.com" <lczerner@...hat.com>,
        "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
        "linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
 STATX_INO_VERSION field

On Fri, 2022-09-23 at 11:56 +0200, Jan Kara wrote:
> On Thu 22-09-22 16:18:02, Jeff Layton wrote:
> > On Thu, 2022-09-22 at 06:18 -0400, Jeff Layton wrote:
> > > On Thu, 2022-09-22 at 07:41 +1000, Dave Chinner wrote:
> > > > e.g. The NFS server can track the i_version values when the
> > > > NFSD
> > > > syncs/commits a given inode. The nfsd can sample i_version it
> > > > when
> > > > calls ->commit_metadata or flushed data on the inode, and then
> > > > when
> > > > it peeks at i_version when gathering post-op attrs (or any
> > > > other
> > > > getattr op) it can decide that there is too much in-memory
> > > > change
> > > > (e.g. 10,000 counts since last sync) and sync the inode.
> > > > 
> > > > i.e. the NFS server can trivially cap the maximum number of
> > > > uncommitted NFS change attr bumps it allows to build up in
> > > > memory.
> > > > At that point, the NFS server has a bound "maximum write count"
> > > > that
> > > > can be used in conjunction with the xattr based crash counter
> > > > to
> > > > determine how the change_attr is bumped by the crash counter.
> > > 
> > > Well, not "trivially". This is the bit where we have to grow
> > > struct
> > > inode (or the fs-specific inode), as we'll need to know what the
> > > latest
> > > on-disk value is for the inode.
> > > 
> > > I'm leaning toward doing this on the query side. Basically, when
> > > nfsd
> > > goes to query the i_version, it'll check the delta between the
> > > current
> > > version and the latest one on disk. If it's bigger than X then
> > > we'd just
> > > return NFS4ERR_DELAY to the client.
> > > 
> > > If the delta is >X/2, maybe it can kick off a workqueue job or
> > > something
> > > that calls write_inode with WB_SYNC_ALL to try to get the thing
> > > onto the
> > > platter ASAP.
> > 
> > Still looking at this bit too. Probably we can just kick off a
> > WB_SYNC_NONE filemap_fdatawrite at that point and hope for the
> > best?
> 
> "Hope" is not a great assurance regarding data integrity ;) Anyway,
> it
> depends on how you imagine the "i_version on disk" is going to be
> maintained. It could be maintained by NFSD inside
> commit_inode_metadata() -
> fetch current i_version value before asking filesystem for the sync
> and by the
> time commit_metadata() returns we know that value is on disk. If we
> detect the
> current - on_disk is > X/2, we call commit_inode_metadata() and we
> are
> done. It is not even *that* expensive because usually filesystems
> optimize
> away unnecessary IO when the inode didn't change since last time it
> got
> synced.
> 

Note that these approaches requiring 3rd party help in order to track
i_version integrity across filesystem crashes all make the idea of
adding i_version to statx() a no-go.

It is one thing for knfsd to add specialised machinery for integrity
checking, but if all applications need to do so, then they are highly
unlikely to want to adopt this attribute.


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ